封面-d97929f4

Sichuan Cuisine in HK Island

In this assignment, ParseHub was used to help get the data from openrice website. According to my food preference, I chose to scrape the information about Suichuan cuisine restaurant from Hong Kong Island, including restaurant name, address, price range, number of likes, number of dislikes, discounts, and signature dishes, about 13 pages and 204 rows.

Moreover, Open Refined was used to clean the data. Firstly, I removed the restaurants which were displayed “being decorated”, “moved away” or “delivery only”. Second, I removed restaurants with neither “number of likes” nor “number of dislikes”. Finally, I changed the prince range “under 50” to “000-050”, and “51-100” to “051-100” for ordering by price ascending in SQL.

Once the data was imported into the table, I designed 4 SQL command to sort the data based on my habits to choose restaurant. The first is to select 10 restaurants with best reputation (according to the number of likes). And then I selected 10 cheapest restaurants used “order by avg_price ASC”. Thirdly, I used “WHERE discounts NOTNULL” and “order by like DESC” to select 5 restaurants having discounts activities with good reputation. At the end, I selected 5 restaurants whose signature dishes are “酸菜鱼”, ordered by number of likes descending. Furthermore, I used Python codes combined with SQL queries to present the result in a Pandas data frame.

When I was doing my assignment, I encountered two problems: one was how to deal with the empty result of Parse Hub, the other was how to add pictures in Jupiter notebook.

This assignment is very interesting for me because I’ve always wanted to learn how to scrap data from websites. However, I found some websites that Parse Hub couldn’t scrap, so hopefully next time I’ll try using Python codes directly.

You can access here.

Similar Posts