Come in and Find the Perfect Hot Pot Restaurant in New Territories!
Hiii, there! Welcome to my second assignment!
In this assignment, I was asked to find a district in open Rice and conduct some analysis after scraping the data. As a huge fan of hot pot, my answer is HOT POT! Firstly, I entered the Open Rice and chose the category of hot pot. However, I quickly realized that there was too much data if I didn’t add other selection criteria. So I decided to choose all the hot pots in New Territories for analysis.
The data scraping process was quite smooth, I scraped all the data in this specified category, and got nearly 300 restaurants with their location, price range, number of collections, positive and negative feedback, and also the comments.
After having all the raw data, I import the file into Open Refine for further data cleaning. In this part, some problems occurred. First comes the number of collections. It was initially in the form of “1K”, which cannot be analyzed as real numbers. After some exploration, I found that I can use the text-transform function to change them into numbers. Then, I applied the split column function to break the price range into one minimum price and one maximum price for further analysis. Finally, I used facet by blank to remove all the black rows.
The last part was to import the CSV file into SQLite and then put the SQL commands into Jupyter Notebook. In this part, I used conventional consumer thinking. First of all, I considered my budget and set my spending range between HK$100 and HK$300. Through the COUNT command, I calculated that 127 hot pot restaurants in the New Territories meet my requirements. However, this is still a too huge range. So I decided to take consumers’ positive feedback and the number of collections into consideration because they are strong indicators of people’s favorable attitudes.
First I sorted out the stores with the highest consumer ratings and collections in the HK$100 to HK$300 spending range respectively.
I then realized that between these two sets of results, there was a portion of common restaurants that excelled in both positive reviews and favorites, and such restaurants were more in line with consumer demand. Therefore, I used the JOIN method to filter out the part of the two sets of results that were identical. I ended up with 22 results. In this process, I also accidentally found an easier way to express SQL commands in python.
Up to this point, the screening as a consumer was over. However, I suddenly realized that I could analyze the conversion rate from “collection” to “to store” from the merchant’s point of view, and rank the stores with higher conversion rates as a reference for business. In this section, I counted each positive or negative review as a customer who had visited the store and compared it to the number of collections for that store. The calculation method I used is as follows:
And I got the following results, they are the top 25 hot pot restaurants with the highest customer conversion rate. The numbers in the conversion column indicate that one out of every X collector chooses to hit the store.