Reference no: EM132343855
Assignment -
As always, do your own work and turn in well-formatted output from the programs in one document. Write in your own words! Show a couple of program windows to indicate how you set up the problem.
This question is from your text (Shmueli et. al) and requires the Accidents (See attached) dataset, which contains information on more than 42,000 accidents in 2001 in the United States. Accidents are classified as no injury, injury or fatality. To complete the assignment, you must create a dummy variable Injury that takes the value yes if MAX_SEV_IR = 1 or 2 and not if MAX_SEV_IR = 0. (The easiest way to create this dummy variable is to use an "if" statement in Excel before uploading to XLMiner.)
1. Partition the data into training/validation sets.
a. Assuming that no information about the accident itself is available at the time of prediction (only location, weather etc.), which predictors can we include? Run a naïve Bayes classifier on the complete training set with the relevant predictors and injury as the response. All predictors are categorical. Show the classification matrix.
b. What is the overall error for the validation set? Explain fully.
c. Look at the conditional probabilities output. Why do we get a probability of zero of P(INJURY= no | SPD_LIM = 5)?
Attachment:- Assignment & Data File.rar