food-cf8a01c1

Finding my best choice in Ma On Shan

In this assignment, I tried scraping, tried cleaning data, tried SQL, and tried Pandas. I learned a lot in the process of doing this assignment.

1.Parsehub
The greatest difficulty of parsehub should be the multi-page data scraping, but the tutorials that come with the software make the process easier. There are not many restaurants in Ma On Shan, so I scraped all of them.

2.Open refine
Data cleaning should be the most difficult step in the whole process. When I looked at the openrefine tutorial, I learned that I had to merge some of the same options and name them consistently. But I followed the tutorial and tried to merge them and found that the system couldn’t recognize similar names – probably because openrice comes with Chinese – and I ended up having to do the unification manually.
After looking at the addresses of all the restaurants, I found that the restaurants in Ma On Shan are mainly built around a few shopping malls, so I wondered if I could look at which mall had the most restaurants. To do this, the names of the malls in the complex address names need to be extracted and merged. The system comes with a limited ability to identify the names, I ended up having to manually and simultaneously use the system to replenish. And I finally came up with a new column – res_locationbrief.

3. SQL
It was kinda fun to use the SQL language, it helped me count out how many restaurants each mall had so I could find the mall with the most restaurants. It also helps me to exclude some restaurants that are not well-rated and expensive, so there are actually not many restaurants for me to choose from. It’s really a very useful tool.

4. Pandas
Pandas can visualize my data so that I can see the best choices for myself in a table format.

Overall, the simultaneous use of these tools really helped me with data discovery and analysis. I also learned a lot about computer languages in the process of doing this assignment. And you can access my web here.

55f54c9513f7146e66fa74858064957c-2141e732

Similar Posts