A New Way to Find the Best Restaurants

ByHong LIN October 16, 2022October 16, 2022

Web scraping can be useful in the digital age, as it makes it possible to collect extensive information on the internet for data analysis. In my second assignment, I went through the basic process of web scraping, data cleaning, and data analysis. It was enjoyable to play with data using Parsehub, Openrefine and SQL.

Since I am living in Fanling, I decided to conduct research on the restaurants in Fanling. Initially, my intention was to find out the gourmet center of Fanling. Using Parsehub, I collected data of more than 200 restaurants in Fanling, including restaurant names, addresses, prices, categories, likes, dislikes, and marks.

When I imported the data into Openrefine for cleaning, I found that the data crawled was quite well-structured, yet there were still some fields needed transformation. I tried to split the category into two fields as many restaurants have two levels of category. At first, I failed to make it by splitting them by space, as the two categories of each restaurant were divided by a blank line. Then I used regular expression by setting “/n” as the separator, and it worked. Another interesting task was to deal with “K” in the field “marks”. My solution was to delete “K” using replace function first, and utilize if function to switch the floats to integers. To better measure the performance of each restaurant, I also generated a new field called “net sentiment” (formula: likes – dislikes) in Openrefine.

After playing with Openrefine for a while, I realized that it was difficult to answer my questions about the gourmet center, because I did not find a way to effectively and logically group the address information. Therefore, I changed my questions as follows:
1. What are the best 10 restaurants in Fanling?
2. What types of restaurants are most common in Fanling?
3. Which cuisines are most appreciated in Fanling?

With the manageable questions in mind, the remaining procedures went smoothly. To answer the first question, a query containing “ORDER BY” could easily solve the problem. For the second question, I used “COUNT” and “GROUP” to find out the number of restaurants in each category. The logic to figure out the last question is quite similar to the second one – “GROUP” is also required, while “AVG” is needed instead of “COUNT” to show the average performance of the restaurants in different categories. I also set a filter to only take into account the cuisines with at least 8 restaurants by “HAVING”, as it could to some extent reduce the bias caused by small number of total restaurants.

Thanks for reading my journal. You can access my website here.

Data Scraping, Cleaning, and Analysis

Analysis of Korean food restaurants in NT

ByYang SHAO October 17, 2022October 17, 2022

As a student lives in New Territories who is fond of Korean food, I choose the Korean food restaurants in this area as study objects. 1.Data scraping After filtering the restaurants by “Korean” and “New Territories”, I obtained 162 search results. I scrapped all the 162 data with the names, prices, types of dishes, number…

Data Scraping, Cleaning, and Analysis

Assignment2: How to find the lowest-priced and most popular Hong Kong-style restaurant in Tai Wai?

ByZi LIN October 17, 2022October 23, 2022

The streets of Tai Wai are intricate, and it takes me at least 15 minutes to get from home to the market, so I always don’t have the patience to go to all the restaurants to explore the food for fear of spending time only to encounter a restaurant that is not good. Coming from…

Data Scraping, Cleaning, and Analysis

Assignment 2: Looking at the data instead of looking at the food review on OpenRice

ByLaam MOK October 15, 2022October 23, 2022

For my second assignment, I collected data from Openrice, which is a food and restaurant guide website that operates in Hong Kong, Among all the districts in Hong Kong, I selected the place where I have lived for over 20 years – Sham Shui Po. You may find the page in Openrice here. Parsehub was…

Data Scraping, Cleaning, and Analysis

Taste Data Analysis With SQL Before Tasting Food With My Friends

ByMungChit TSUI October 17, 2022October 23, 2022

When I was thinking about which districts I should work on, Mong Kok popped up in my mind first as my friends and I always find it difficult to choose a restaurant when we have been there as there are numerous restaurants for us to choose from. Upon deciding on Mong Kok as my target…

Data Scraping, Cleaning, and Analysis

My SECOND assignment that made me hungry but happy

ByYan LIN October 14, 2022October 23, 2022

Let’s overcome the FOBO(the Fear of Better Options) when deciding what to eat in Sham Shui Po!

Data Scraping, Cleaning, and Analysis

Attractive Restaurants In Ma On Shan

ByQi WENG October 15, 2022October 15, 2022

The second assignment gave me a complete experience of the data collection, cleaning, and analysis process. You can access my website here. First, I chose the Ma On Shan area where I live and scratched the data of Ma On Shan restaurants in openrice, including restaurant name, number of favorites, categories, price range, number of…

Similar Posts