How many records would you expect would be removed

Assignment Help Basic Statistics
Reference no: EM131114153

1. A dataset has 1000 records and 50 variables with 5% of the values missing, spread randomly throughout the records and variables. An analysis decides to remove records that have missing values. About how many records would you expect would be removed?

2. Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response).

a. Compute the prior probability

P(PLAY='Yes') =
P(PLAY='No') =

b. Compute the conditional probability

P(Outlook='Sunny'|PLAY='Yes') =
P(Outlook='Sunny'|PLAY='No') =

P(Temperature = ‘Mild'|PLAY='Yes') =
P(Temperature = ‘Mild'|PLAY='No') =

P(Humidity = ‘High'| PLAY='Yes') =
P(Humidity = ‘High'| PLAY='No') =

P(Windy = ‘False'| PLAY='Yes') =
P(Windy = ‘False'| PLAY='No')=

3. Using naïve Bayes classification method to classify the following unknown record and to indicate whether to play or not.

(Outlook = ‘Sunny', Temperature = ‘Mild' , Humidity = ‘High' , Windy = ‘False')

4. Association Rule Mining:

Given a transaction database for mining association rule as follows:

Database D

TID

Items

100

A C D

200

B C E

300

A B C E

400

B E

Please useApriorialgorithm to mine association rules with minimum support count = 2.

(Please show the derivation process step by step with candidate itemsets.)

Reference no: EM131114153

Questions Cloud

How these elements contribute to the central ideas of play : Review the stage directions and, in your discussion post, identify the most important aspects of the setting.Then, consider how these elements contribute to the central ideas of the play
How many gates would such a system require : Develop a two-dimensional addressing system using a 6-to-64 decoder, a 64-word×128- bit matrix, and 16-input multiplexers. How many gates would such a system require?
How would the results be used to make a diagnosis : Explain what physical exams and diagnostic tests would be appropriate and how the results would be used to make a diagnosis. List five different possible conditions for the patient's differential diagnosis, and justify why you selected each.
Determine the value of the company shares : The average growth of dividends for the past five years is expected to persist in the foreseeable future. You are required to determine the value of the company's shares after payment of the dividend of 2004.
How many records would you expect would be removed : A dataset has 1000 records and 50 variables with 5% of the values missing, spread randomly throughout the records and variables. About how many records would you expect would be removed?
Explain the implied volatility : Find the price of a six month european call option on a non-dividend paying stock with a strike price of 20 when the current stock price is 18, the risk free rate is 6% per annum and the volatility is 30 per annum. Use the Black scholes merton mod..
Describe the two families in the film : Describe the two families in the film (ie the names of the family, people in household, jobs held, current financial situation,etc) - Did race impact the families lives? Explain
Minimum average collection period : The minimum average collection period required to approve the cash discount plan is _________days?
Show a block diagram of an srff connected to store 1 bit : Using 4 SRFFs obtain the block diagram for an SISO shift register.

Reviews

Write a Review

Basic Statistics Questions & Answers

  Assuming all sex distributions to be equally probable what

assuming all sex distributions to be equally probable what proportion of families with exactly six children should be

  Prediction interval for the regression lineusing the age

prediction interval for the regression line.using the age and sick days data from table below find the 98 prediction

  Find probability that sample mean is between a range

Standard deviation of 10 minutes. A random sample of 16 cars is selected. What is the probability that the sample mean is between 45 and 52 minutes.

  Find probability-mean energy consumption level is greater

If 50 different homes are randomly selected, find the probability that their mean energy consumption level for September is greater than 1075 KWh.

  At the 005 level of significance determine whether there is

q1. a survey was conducted in five countries.the percentages of respondents who said that they eat out once a week or

  Hypothesis process with real world examples

Using P-values for hypothesis testing they are intuitively easy and dont require critical values in charts. In real world terms, what is the best way to describe how P-values work and how to explain this to someone?

  Explain logic behind using sum of products of deviation

What is the logic behind using the sum of the products of deviation scores as the numerator of the formula for the correlation coefficient? Is this logic sound? Why or why not?

  Discusses about computation of correlation

These two questions relates to Basic Statistics. The first question discusses about computation of correlation. The second question is about the features of correlation coefficient

  What is the z score for the university of evansville which

according to u.s. news amp world report data in 1995 tuition costs at indiana university a public university were 2984

  Develop a hypothesis test that can be used to determine

annual per capita consumption of milk is 21.6 gallons statistical abstract of the united states 2006 being from the

  Sample proportion of large gloves

The sample proportion of large gloves for each location is            and           . (Round your answers to 4 decimal places.)

  Compute time in hours for population standard deviation

The time in hours required for 40 individuals to complete their federal income tax returns. Using past years' data, the population standard deviation can be assumed known with σ=9 hours.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd