Reference no: EM133126294
Statistics for Data Analytics
Summary:
PART A - Time Series Analysis
The ‘CarRegistrations.csv' datafile, uploaded on Moodle, is a monthly time series of new private car registrations in Ireland from January 1995 to January 2022 inclusive. (Source: Central Statistics Office, Ireland)
You are required to estimate and report on suitable time series models for this series. Your report should contain the following elements: A preliminary assessment of the nature and components of the raw time series, using visualisations as appropriate.
Estimation and discussion of candidate time series models from each of the categories listed below. Appropriate diagnostic tests and checks should be undertaken.
1. Exponential Smoothing / ETS models
2. ARIMA/SARIMA models
3. Simple time series models
Discussion on your choice of an ‘optimum' model for this series, from the above, which you should use to forecast for six periods ahead with prediction intervals. Provide commentary on the adequacy of your model for forecasting purposes.
PART B - Logistic Regression
The ‘Default.csv' file, uploaded on Moodle, contains details of the characteristics of 2700+ customers of a credit institution and whether they have a loan default on record or not.
In addition to the dichotomous dependent variable [ No default on record (0) / Default on record (1)], customer characteristics provided are:
Gender
0=Male, 1=Female Age in years
Years of education Retired
0=not retired, 1=retired Household income in thousands Credit card debt in thousands Other debt in thousands
Marital status
0=unmarried, 1=married Home ownership
0=rents, 1=owns home
Using these data, you are required to estimate a binary logistic regression model to facilitate understanding of the relationships between the given customer characteristics and classification of default. If you deem it useful, you may employ dimension reduction techniques. In your report you should:
1. Use descriptive statistics and appropriate visualisations to provide a preliminary understanding of the variables in the dataset.
2. Describe the model building steps you undertook in the process of arriving at your final logistic regression model. The rationale for rejecting intermediate models should be explained clearly.
3. Provide a succinct summary of the parameters of your final model, verify that relevant assumptions are met and discuss odds ratios, the confusion matrix and measures of model fit.
Attachment:- Statistics for Data Analytics.rar