a2_feature img-602cc4d5

If I open a restaurant in Ma On Shan…

I moved and lived in Ma On Shan for nearly two months. I love this quiet and lovely place along the coastline. However, it is also known as a“delicious food desert“ (美食荒漠) – people have limited access to delicious food. Is it true? How many restaurants are in Ma On Shan? What are the most popular restaurants here? Their price ranges? Also, if I would like to open a restaurant here (a good business opportunity considering it is the“delicious food desert”), what kinds of the restaurant at which price range will be the best choice? I tried to use SQL to address this problem. To quickly glance through the analytical result, please click here. The following section will focus more on how I dealt with the data, especially the challenge I met during the process.

ParseHub and Open Refine were used to collect and clean the data from Openrice. For the restaurant information, I mainly focused on how many comments, like, and dislike they received, their price range and also what kinds of food they serve. These metrics can basically answer my questions above. The web scraping process was smooth, collecting around 17 pages of data. Then Open Refine was used to remove the duplicated entries, and fill up the blank space to be replaced by “NA”. Then cleaned database was exported as a CSV file waiting to be analyzed via DB Browser.

WeChat 截圖_20221010102436-8255fd09

To write the SQL queries about the Top 10 restaurants receiving like, dislike and comments were easy. The most challenging part for me was to find out the number of restaurants under different price ranges and put the codes into Jupyter Notebook.

First, I used count and group by functions to address the problem. However, the data format was not easy to read the data – the second column should be transposed into the first row. Then it was a big headache for me! I searched online and spent lots of time to tried many methods (maybe because I didn’t know how to ask the question), including PIVOT, case when, sum, count and so on. And when I was going to lose my mind by trying lots of times and failing all the time, I made it. This SQL query combined the functions of sum, case when, group by, as, and sub-query. And I had to say that sometimes it cannot work not because the query was wrong, but about the format. For example, when I wrote AS 0-50, or AS 0到50, it cannot work, but it could work after adding the quotation mark (i.e., “AS 0-50”). When I thought I could relieve myself, I found another challenge waiting for me. Since the SQL query was too long, I broke it into several queries. But it cannot function after I broke them. I tried for more than one hour and finally I learned that I can use the print function (so stupid that I did not think of this function earlier) to check whether I wrote the correct query, and the problem was solved.

reflection_dbbroswer-fa92c8d9

Anyway, my conclusion is that it would be a good idea to open a restaurant offering light meals, especially Teochew cuisine, with prices below 100 HKD. Even though this data analysis process is filled with difficulties as I have to rewrite the SQL query many times to get the result, it is worth the effort. I not only learn through the whole process of scrapping, cleaning, and analyzing the data, but also gain a deeper understanding of Ma On Shan, get closer connections with it, and feel a sense of belonging.

Similar Posts