IMG_2970-5ded81a4

A bite of Whampoa: Creating midnight snack list for myself

Living near Whampoa, it is always difficult to find a place to enjoy midnight snacks. Not because there are limited choices, but due to the fact that restaurants are spoilt for choices.😂

So when I start doing this research, the thought that came to my mind was, why not use this opportunity to make a list of my nightly snack choices for the ongoing future? With this thought in mind, I launched a web crawl and data cleaning.

At first, I found that everything went well, I scraped the data and tried to it out in the openrefine. But things came out that due to the limited function we have learned, I can not process the data in the most effective way. I was in a rut for a while. But one thing suddenly struck me. Hey, I can use value.replace to process all the things. So I first replaced “$” with a blank, then I used “trim leading and trailing blank spaces” to delete that blank. The split of the cuisine type was also realized in this way. Nevertheless, what’s a little bit tricky is that there are also blanks between “粤菜” and “(廣式)” as you can see here. So again, I executed the command of value.replace(” (“, “(“) to remove the blank. The handling of blank spaces is a real head-scratcher. You need to clarify byte by byte to see if there is anything wrong-typed to obstacle the query. Finally, all the data has been cleaned and is easy for me to browse!

Once the data was processed, I started to think about what factors I reckon important when it came to choosing a restaurant. “Price”? “Cuisine types”? “Customer feedback”?

I soon realised that these were all the factors I would consider, but more often than not, a specific dish I wanted to eat in particular would just pop into my head, like “sushi” or “noodles”. With this in mind, I first started to work with the data using SQL.

I try to simulate the real occasion: If I just want to eat noodles tonight, whether it’s Hong Kong-style cheddar noodles or Japanese ramen, what are my options?

Under this context, I use the ‘like’ command, “…WHERE rest_category_keyword LIKE ‘%麵%…’ “. Then I got all the records containing restaurants whose sub-category is noodles. Perhaps, this is my midnight snacks list for next Monday, who knows?😆

Then I started to be curious about what kind of restaurants with dishes were actually near my house. And what is the percentage of them respectively? So I tried to use the count command to count the number of times each cuisine type appeared and sort them in descending order to find out which were the cuisine types with the most restaurant. To get the outcome more visualized, I want to use matplotlib to form a chart. Giving the logistics of this work, I searched for additional tutorial videos on this website making cuisine type proportion pie chart.

pie_chart-6fa1974d

To get the outcome more precisely, ‘explode’ was used to see how many types of foreign cuisine are popular related to this area. Do you think it’s clear?

For more data processing results, you can link to my website here!

Similar Posts