hotpot-dff8a792

Hey, it’s HOTPOT season!

The second assignment was another new experience for me. It was like being in touch with actual “big data.” Since I am a hotpot lover, I wanted to address the question of which is the best hot pot restaurant in the New Territories.

To get the answer, I pulled all the hotpot restaurants in the New Territories from the Openrice website, and there are around three hundred of them. After grabbing about 17 pages of data, I did a data cleanup on Openrefine.

Among them, there were about a dozen stores that had zero saved, which made no sense to me. When I went to the store page to check, I found that the collection volume was not zero. The reason seems to be that the NT hotpot store collection page on Openrice was not updated in time. So, I manually added the correct numbers on Openrefine for those hotpot restaurants that had a saved number of null.

Another challenge with the data was how to unify the numbers with the digitized expressions of the class containing the “k” so that they could be sorted by size. My solution was to add a new column input command to the cleaned CSV file to make the change.

Finally, when executing the SQL statement in Jupyter notebook, a mistake took me a lot of time to find the reason for. It turns out that SQL LIKE statements in Jupyter notebook cannot use double quotes “”, but need to use single quotes ”. But in the SQL database, there is no difference between these two symbols.


Finally, if you are also a hotpot lover, you are welcome to visit the website. After all, the hotpot season has arrived!

Similar Posts