Have a Look at The Fierce Japanese Cuisine Competition in HongKong
It is generally known that Japanese stuff always receives popularity in HK. For the first time I dig into the Openrice filter, the number and proportion of Japanese restaurants still shock me. In the richest areas like TST and Causeway Bay, Japanese food wins over all the foreign restaurants. In the relatively poorer areas, the same. The aim is simply to find the most popular Japanese restaurants in two financially varied districts (Causeway Bay and Sham Shui Po) to see if there are some differences. The results is available here. (Also make some improvements to my first assignment)
The first step is about scraping. I included information like name, average price, subcategory, the number of wish lists as well as the number of good, ok and bad reviews for scraping in order to calculate the popularity and favorable rating percentage. The number of reviews is stored in the secondary page, which makes it a bit more complicated. Therefore I created a new template to achieve the goal. During the scraping, I found that there were lots of advertisements that should not belong to my target data in terms of their location and food category. However, I decided to deal with them in the second step.
The second step is about cleaning. I had several goals. Firstly, delete those duplicate restaurants (by blanking down). Secondly, delete those advertisements (which is done by looking into the text facet of restaurant_address and restaurant_subcategory). Thirdly, correct those strange symbols caused by Japanese expressions (text facet–cluster). I also created a new column to calculate the sum of reviews and turned them into numbers.
After turning the file into database, here comes the sql analysis. Though I encountered some problems regarding reading the data into dataframe, I figured them out by repeating previous steps. So how can I rank those restaurants? I attemped to show the results from three perspectives. The first is to rank them according to the number of wish lists (how many people desire to go to the restaurant). The second is to rank them according to the percentage of good reviews. The last one is trying to present a relationship between the number of people have tried (have written a review) and the number of wish lists. For example, if the “gopercent” is 0.2, it indicates that at least 1 out of 5 people goes to the restaurant after bookmarking. After some simple queries, I think I have found a clue of the most popular restaurants in both districts.
The result has shown something I did not expect. While most popular Japanese restaurants in Causeway Bay is more than 200 HKD per person, the price in Sham Shui Po is much lower. This can be partly explained by the consumption level variety in two areas. In addition, the general popularity of restaurants in Causeway Bay is much higher than those in SSP judged by the sum of reviews and “gopercent” (more people out of the same base will go after adding the restaurant into the wish list). This might be the geographical and commercial advantage of Causeway Bay.
This assignment actually has given an overview of the data analysis process. And I have also got some ideas about which restaurant I should go when traveling to Causeway Bay and SSP. The result is here