Explain what is the overall error for the validation set

Assignment Help Applied Statistics
Reference no: EM132343855

Assignment -

As always, do your own work and turn in well-formatted output from the programs in one document. Write in your own words! Show a couple of program windows to indicate how you set up the problem.

This question is from your text (Shmueli et. al) and requires the Accidents (See attached) dataset, which contains information on more than 42,000 accidents in 2001 in the United States. Accidents are classified as no injury, injury or fatality. To complete the assignment, you must create a dummy variable Injury that takes the value yes if MAX_SEV_IR = 1 or 2 and not if MAX_SEV_IR = 0. (The easiest way to create this dummy variable is to use an "if" statement in Excel before uploading to XLMiner.)

1. Partition the data into training/validation sets.

a. Assuming that no information about the accident itself is available at the time of prediction (only location, weather etc.), which predictors can we include? Run a naïve Bayes classifier on the complete training set with the relevant predictors and injury as the response. All predictors are categorical. Show the classification matrix.

b. What is the overall error for the validation set? Explain fully.

c. Look at the conditional probabilities output. Why do we get a probability of zero of P(INJURY= no | SPD_LIM = 5)?

Attachment:- Assignment & Data File.rar

Reference no: EM132343855

Questions Cloud

Compare the supply curve in the market for bank : Compare the supply curve in the market for bank reserves prior to 2008 with the supply curve following the financial crisis.
Explain the presence of political behaviour : Developing Professional Practice-Evaluate what it means to be an HR professional, with reference to the CIPD’s HR Profession Map
Engaging in intercultural activities and interactions : You may earn extra credit in this course by engaging in intercultural activities and/or interactions. You may visit a cultural center, attend cultural festival.
Advantages and drawbacks of convertible financial instrument : Discuss the advantages and drawbacks of the convertible financial instruments.
Explain what is the overall error for the validation set : This question is from your text (Shmueli et. al) and requires the Accidents dataset, What is the overall error for the validation set? Explain fully
Programs are outdated and no longer relevant in our society : The resolution for the debate is: It is resolved that age entitlement programs are outdated and no longer relevant in our society.
Calculate cumulative cash inflows on a? year-to-year : How long will it take for Bill to recoup his initial investment in project? A? How long will it take for Bill to recoup his initial investment in project? B
Debt valuation and interest rates : Describe the four key bond valuation relationships. Explain why bond prices move inversely to changes in interest rates.
Evaluate the health of banks by supervisors : Make a case for one making this information public and a case for keeping it private.

Reviews

Write a Review

Applied Statistics Questions & Answers

  Calculate confidence interval for the mean well-being score

401077 Introduction to Biostatistics Assignment - Calculate the 95% confidence interval for the mean well-being (WEMWBS) score in this population

  Find a random sample and explain how you got it

Math 171 - Descriptive/Sampling Project - Use statistical methods to gather and analyze data. Find a random sample and explain how you got it

  Mean of µ and a standard deviation

A random sample obtained from a population has a mean of µ and a standard deviation of σ = 20. The error between the sample mean and the population mean for a sample of n = 16 is 4 points and the error between a sample men and population mean for a s..

  The u.s. food and drug administration

The U.S. Food and Drug Administration

  Describe difference between interpolation and extrapolation

Describe the difference between interpolation and extrapolation. Explain, in your own words, this difference and provide a real-life example of this difference.

  Develop a linear regression model

MBAC6031 Quantitative Methods Practice Final Exam. Develop a linear regression model that can be used to estimate the level of charitable contributions

  Find a peer-reviewed journal article

Find a peer-reviewed journal article that employs the Pearson's correlation coefficient

  A major dvd rental chain is considering opening a new store

A major dvd rental chain is considering opening a new store in an area that currently does not have any such stores. The chain will open if there is evidence that more than 5,000 of the 20,000 households in the area are equipped with dvd playe..

  A hospital conducted a study of waiting time in its emergenc

A hospital conducted a study of waiting time in its emergency room. The hospital has a main campus and three satellite locations.Management had a business objective of reducing waiting time for emergency room cases that did not require immediate atte..

  Identify the three most expensive ushabtis in the dataset

Stata Assignment - Identify the three most expensive ushabtis in the dataset and describe their attributes. Are they statistical outliers

  What are confidence intervals

Discuss the importance of constructing confidence intervals for the population mean.What are confidence intervals?

  What is sampling with replacement and why is it used

What is sampling with replacement and why is it used? What proportion of a normal distribution is located between each of the following z-score boundaries?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd