The Database
MySQL scripts cars_for_sale.sql and car_details.sql containing the data to use for this task have been supplied. To complete this task, you need to first create a database called inb104 and two tables called cars_for_sale and car_details. how to enter data into the two tables by using the supplied scripts. (Note that the supplied scripts are not database 'dumps', so you cannot just 'import' the whole database. You need to manually set up the database first and then run the scripts.) The two tables you must create in the inb104 database must contain the following fields. For the cars_for_sale table:
CarId VARCHAR(45)
make VARCHAR(45)
model VARCHAR(45)
For the car_details table:
CarId VARCHAR(45)
seriesyear VARCHAR(45)
price INT(11)
kilometres VARCHAR(45)
color VARCHAR(100)
transmission VARCHAR(45)
CarId is the Primary Key for both tables.
As usual we have supplied Python 'question' template files for you to complete, containing unit tests that your functions must pass. You should rename these files, removing the "_Q" suffixes, and complete your solutions in the places indicated.
Specific Tasks
The specific things you must do to complete this portfolio task are as follows.
1. Create a database named inb104 containing two tables called cars_for_sale and car_details, as explained above, to import the data into.
2. Import the vehicle data via the cars_for_sale.sql and car_details.sql scripts.
3. Write a Python function named top_N_makes(N) that extract the number of occurrences of each make in the car dataset and prints the top N most frequent makes together with their corresponding number of occurrences, one make per line. Parameter N indicates the number of makes to be printed.
For example, if BMW appears in 100 rows in the table cars_for_sale, its frequency will be 100. If BMW and KIA are the top two most frequent makes, occurring 100 and 89 times, respectively, your function is expected to produce the following result for the function call top_N_makes(2).
>>> top_N_makes(2)
BMW 100
KIA 89
Note that if more results are requested than there are distinct makes in the database then all makes must be listed. Also note that if two makes have the same frequency of occurrence then they should appear alphabetically.