how-much-data-feature-51df08fc

Taste Data Analysis With SQL Before Tasting Food With My Friends

When I was thinking about which districts I should work on, Mong Kok popped up in my mind first as my friends and I always find it difficult to choose a restaurant when we have been there as there are numerous restaurants for us to choose from.

Upon deciding on Mong Kok as my target district, I used ParseHub to scrap data of all Mong Kok restaurants that I need from Open Rice which are restaurant names, prices, categories and sub-categories, bookmark numbers, positive and negative ratings, reviews as well as coupon details. For my first download, I found that only 2 pages had been scrapped so I reviewed the steps on ParseHub to check which step caused the problem. I realized that I had clicked on the page 2 button instead of the next page button which led to the scraping process being stopped at page 2. I thus re-defined the selections with the next page button and then successfully obtained the data of 302 Mong Mok restaurants showing on Open Rice. When I had a quick scan of the scrapped data, it was not hard to find that some information have been black.

Therefore, I imported the CSV file into Open Refine to clean the data. Apart from splitting the price range column into the lower price and higher price, I also sorted the data by restaurant name in ascending order, removed duplicates of restaurants and renamed the columns for making further data presentation and processing easier. Moreover, I encountered difficulty when transforming the wrong data type for bookmark numbers from text to number. I was thinking about how to convert “1K” to “1000” and then I found that I could use the transform function to remove the “K” and times the whole column by 1000. In addition, when I was finalizing the data that I would use for analysis, I removed columns of reviews, coupon details and higher prices as these data were irrelevant to what I wanted to know about.

The clean data was imported into SQL for analysis. Aggregate and statistic functions were mainly used to perform the analysis. In order to find out the total number of restaurants by category in Mong Kok, the COUNT() function and GROUP BY statement were used to compute the results. As for finding the average number of bookmarked restaurants by category, the AVG() function and GROUP BY statement were used, and the round() function was used to remove the decimal point. Furthermore, I spent quite a lot of time trying how to show only 10 results for each category in the same table. I first computed it by using the GROUP BY statement and ORDER BY keyword, but only 1 result was shown for each category. In this regard, I looked for solutions online and learned that the PARTITION BY clause divides the result set into different groups of rows while the ROW_NUMBER() function could be applied to have the row number for ranking. Besides, the WHERE and <= clauses were used to limit the shown results by each category. The DESC keyword was used as well in all 3 commands to arrange the results with the largest ends. Last but not least, Python was used with SQL so that better visualization of the results could be shown.

If you are interested in the results, please visit my website here!

Similar Posts