Reference no: EM133092398
Statistics for Data Analytics
Project: The IncomeData.csv file uploaded on Moodle contains financial and profile data on a sample of 4500+ individuals.
Using these sample data, you are required to estimate and report on a multiple regression model to facilitate understanding of the criteria that influence income and useful for the prediction of same.
In addition to his/her income in thousands of Euro, data is provided for each individual in the sample on: Age in years (age)
Years of education (yrsed)
Level of education (edcat)
1=Did not complete high school. 2=High school degree. 3=Some college. 4=College degree. 5=Postgraduate degree
Years with current employer (yrsempl) Credit card debt in thousands (creddebt) Other debt in thousands (othdebt)
Ever defaulted on a bank loan (default) 0=no. 1=yes
Job satisfaction (jobsat)
1=Highly dissatisfied. 2=Somewhat dissatisfied. 3=Neutrat 4=Somewhat satisfied. 5=Highly satisfied
Home ownership (homeown) 0=rent t=own
Years at current address (address) Number of cars owned/leased (cars) Value of primary vehicle (carvalue)
In your report you should:
1. Use descriptive statistics and appropriate visualisations to enhance understanding of the variables in the dataset.
2. Describe the model building steps you undertook in the process of arriving at your final regression model. The rationale for rejecting intermediate models should be explained clearly and details provided on rationale for choosing predictors, treatment of outliers, transformations undertaken etc.
3. Provide details on diagnostics undertaken to verify that the Gauss Markov and other relevant assumptions of multiple regression have been satisfied.
4. Provide a succinct summary of the parameters of your final model and details of model performance and fit.
The report is subject to a maximum page count of 5 pages. The paper is intended to be a business report but please use the fonts. layout and treatment of figures specified in the IEEE format.
Attachment:- IncomeData.rar