nba_results_2014-4ea29b71

SQL is a Great Tool for Data Analysis

Scraping is a useful source for obtaining data among the various methods that are available out there. Nevertheless, the work does not stop there as the scraped data often needs to be cleaned and transformed into the proper format and structure until it can be further used.

In the 2nd assignment, ParseHub and Open Refine were used to help get the data into a usable state by removing blank and null entries for performing further data analysis. The problem at hand was: How can I find out the top 10 players and colleges with the highest average score per year from the NBA 2014 Draft dataset?

As for the data analysis part, SQL was picked as the tool. When used in combination with Python, the outcome is impressive. Python has a rich support of 3rd party libraries such as Pandas, NumPy, Matplotlib, etc., which can further extend the scope of tasks to be tackled by the collective tools.

Once the data was imported into the table, the SQL command targeted the # of games, # of years, and points fields for computing the game per year and average points per year results. This would lead to the sort out of the top 10 players with the highest average score.

Furthermore, by grouping the players according to the college enrolled, we could come up with the top 10 colleges with the highest average score.

You can access the website here.

Similar Posts