Reference no: EM132350224
Business Analytics Methods
Question 1
Use the data set "Heart Disease 2" in your SAS portal (the descriptions of the data are included with the assignment descriptions) to perform your analysis and answer the following questions.
1) Develop a logistic regression model to predict the probability that a subject has heart disease. You can discuss and include any interaction effects that you think appropriate. Discuss your final model in terms of fit, accuracy. Based on your final model, discuss the impact of gender, age and smoking on the risk of having heart disease. Be as detailed as you can in the discussion.
2) Develop a decision tree to predict whether a subject has heart disease. Based on your tree, which are the most important predictors of heart disease?
3) Which model among the two you have developed would you choose to predict future heart disease? Explain your choice
4) What approach could you suggest to further improve the model you chose?
male
|
1 if male; 0 if female
|
age
|
age in years
|
education
|
1 = Some High School; 2 = High School or GED; 3 = Some College or Vocational School; 4 = college
|
currentSmoker
|
1 if yes; 0 if no
|
cigsPerDay
|
number of cigarettes smoked per day (estimated average)
|
BPMeds
|
1 if taking high blood pressure medicines; 0 if no
|
prevalentStroke
|
|
prevalentHyp
|
|
diabetes
|
0 = No; 1 = Yes
|
totChol
|
mg/dL
|
sysBP
|
systolic blood pressure mmHg
|
diaBP
|
diastolic blood pressure mmHg
|
BMI
|
Body Mass Index calculated as: Weight (kg) / Height(meter-squared)
|
heartRate
|
|
glucose
|
|
TenYearCHD
|
1 if having heart disease; 0 if no
|
Question 2
Use the data set "Cars" in your SAS portal (the descriptions of the data are included with the assignment descriptions) to perform your analysis and answer the following questions.
1) Before running any analysis, how many clusters do you think that the data can be described in? Explain your answer (Hint: There is no correct answer, as long as you can explain your answers logically)
2) Perform a cluster analysis to verify your guest in the previous question (Hint: try the analysis with different number k of clusters k=2,3,4,5). With your final model of choice, describe the differentiating characteristics of the clusters.
mpg
|
miles per gallon (distance driven with 1 gallon of fuel)
|
cylinders
|
no. of cylinders
|
cubicinches
|
volume of cars (in cubic inches)
|
hp
|
horse power
|
weightlbs
|
weight in lbs
|
time-to-60
|
time to accelerate from 0 to 60mph
|
year
|
production year
|
brand
|
|