Reference no: EM133388385
Task 1. Exploratory data analysis
This task is to undertake exploratory data analysis on an Australian Covid-19 dataset which is provided with this assignment. This dataset summarises the number of daily cases and deaths from 01/01/2020 to 25/08/2020.
Write a Python program to complete the following tasks.
1. Load the data and extract information to fill the following form
|
Mean value
|
Max value
|
Min value
|
Case
|
|
|
|
Death
|
|
|
|
2. Plot a data scatter map for the case against the death. Copy your graph here.
3. Plot a bar chart for the case against the date. Use date as the horizontal axis and case as the vertical axis. Copy your graph here.
4. What observation can you get from two plotted graphs?
Task 2. Regression analysis
Given the above dataset, write a Python program to build regression models. Complete the following tasks.
1. Build a linear regression model to fit the relationship between case and death. Plot the regression model on a graph. Copy the graph here.
2. Given the regression model built from the data, what is the predicted deaths when cases are 1000? Put your analysis here.
3. Build polynomial regression models with different degrees of complexity. Plot graphs for degrees from 2 to 5. Which degree gives the best fit based on quantitative analysis? Put your analysis here.
Task 3. Classification
The provided dataset wine_quality.csv shows the relationship between chemical measures and quality. Build classifiers to classify the quality of wines. Use 50% of the data for training and 50% for testing. Report and compare the testing accuracies from logistic regression and support vector machine. Analyse the classification results.
1. Using different features as input to train logistic regression and support vector machines on the same training set, respectively, and then report the testing accuracy on the testing set in the following table.
Features used
|
Logistic regression
|
Support vector machine
|
fixed acidity
|
|
|
volatile acidity
|
|
|
citric acid
|
|
|
residual sugar
|
|
|
fixed acidity + volatile acidity + citric acid + residual sugar + chlorides value
|
|
|
All features
|
|
|
2. Make an analysis of the above results, and answer the following questions:
a) Which feature is more useful in predicting the quality, fixed acidity, volatile acidity, citric acid, or residual sugar?
b) Is there any relation between the number of features used with the prediction accuracy in this task?
Attachment:- Data analysis.rar