Simple linear regression

Assignment Help Basic Statistics
Reference no: EM13534472

Simple Linear Regression

Dataset Need: IQ

1. Relationship Between Eighth Grade IQ and Ninth grade Math Score

For a statistics class project, students examined the relationship between x = 8th grade IQ and y = 9th grade math scores for 20 students. The data are displayed below.

Student Math Score IQ Abstract Reas
1 33 95 28
2 31 100 24
3 35 100 29
4 38 102 30
5 41 103 33
6 37 105 32
7 37 106 34
8 39 106 36
9 43 106 38
10 40 109 39
11 41 110 40
12 44 110 43
13 40 111 41
14 45 112 42
15 48 112 46
16 45 114 44
17 31 114 41
18 47 115 47
19 43 117 42
20 48 118 49

Open the dataset IQ found in the Datasets folder in ANGEL. Perform a linear regression with the Response (dependent variable) math score and the variable IQ as the Predictor (independent variable). Store/Save the (unstandardized) Residuals and Fitted(Predicted) values. These will be stored in the fourth and fifth columns of the data worksheet. The output should look as follows:

SPSS: Regression Analysis: Math Score versus IQ

a. Explain this equation. Discuss slope as change in Y per unit change in X in context of the variables used in this problem

b. Create a scatter plot of the measurements by selecting Math Score for the y-axis (response) and IQ for the x-axis (predictor). Describe the relationship between math score and IQ. Minitab Users: Graph > Scatter Plot > Simple. SPSS Users: Graphs > Legacy Dialogues > Scatter/Dot > Simple Scatter

c. One of the students with a high IQ (number 17) appears to be an outlier. With a sample size of only 20 this can affect our normality assumption. Also, the constant variance assumption could be compromised. We can visually check for constant variance using a Residual Plot and test for normality using a Probability Plot (or Q-Q plot)t. To get a residual plot, simply create a Scatterplot using the Residuals as the y-variable and the Fitted(Predicted) Values as the x-variable. (Remember these should have been stored/saved when you first performed the regression per instructions above. If not, re-run regression and click store/save and click the boxes for unstandardized residuals and fits(predicted) values.)

Now create a probability plot (Q-Q plot if using SPSS) of the residuals. We are provided the results of a test of the null hypothesis that the data follows a normal distribution. Based on these two graphs and what you have learned about hypothesis testing, what interpretations do you come to regarding the assumptions of constant variance and normality?

Minitab Users: Probability plot go to Graphs > Probability Plot > Single and select Residuals

SPSS Users: Q-Q plot with normal test go to Analyze > Descriptive Statistics > Explore and enter Unstandardized Residuals in Dependent List click Plots and select box for Normal plots with tests

d. The least squares regression line for predicting math score from IQ is given in the above output. What is the fitted regression line (i.e. regression equation)?

e. What do the Fitted (predicted) values and Residuals represent?

f. Based on the output, what is the test of the slope for this regression equation? That is, provide the null and alternative hypotheses, the test statistic, p-value of the test, and state your decision and conclusion.

2. Although outliers should never be deleted without a reason, there are several reasons why it may be legitimate to conduct an analysis without them. Delete the IQ data point for row 17 and re-calculate the regression line for the remainder of the data . You should obtain the following output:


1.) Directions: Read the following problem. In your drop box submission, please include the following information:
a. Determine if the problem is either a test of hypothesis, a confidence interval or something else and specify the 'key words' found in the problem that demonstrate your choice.
b. Determine the procedure name and parameters involved for each problem (use the Stat 200 Formulas and Techniques Summary document.) Specify the 'key words' found in the problem that lead you to this choice.
c. If the problem is a hypothesis test, indicated if it has a lower-tail, upper-tail, or two-tail alternative hypothesis as well as the test statistic formula for the test. If the problem is a confidence interval then indicate the formula used for the margin of error.
You DO NOT need to do the ACTUAL test of hypothesis, confidence interval, etc. in order to submit in the dropbox, just answer parts a, b, and c for the problem.

A large organization is being investigated to determine if its recruitment is sex-biased. Tables 1 and 2, respectively, show the classification of applicants for sales and for secretarial positions according to gender and result of interview. Table 3 is an aggregation of the corresponding entries of Table 1 and Table 2.
a. According to the data in tables 1 and 2, does there seem to be an association with gender and hiring status at the 5% significance level?
b. Does the data in Table 3 indicate the same result.

Table 1 Sales Positions





Table 2 Secretarial Positions





Table 3 Secretarial and Sales Positions





2.) Directions: Read the following problem. In your drop box submission, please include the following information:
a. Determine if the problem is either a test of hypothesis, a confidence interval or something else and specify the 'key words' found in the problem that demonstrate your choice.
b. Determine the procedure name and parameters involved for each problem (use the Stat 200 Formulas and Techniques Summary document.) Specify the 'key words' found in the problem that lead you to this choice.
c. If the problem is a hypothesis test, indicated if it has a lower-tail, upper-tail, or two-tail alternative hypothesis as well as the test statistic formula for the test. If the problem is a confidence interval then indicate the formula used for the margin of error.
You DO NOT need to do the ACTUAL test of hypothesis, confidence interval, etc. in order to submit in the dropbox, just answer parts a, b, and c for the problem.

An instructor at Arizona State University asked a random sample of eight students to record their study times in a elementary statistics course. She then made a table for total hours studied over 2 weeks and test scores at the end of the 2 weeks. Here are the data:

Study time 10 15 12 20 8 16 14 22
Test Scores 92 75 86 76 92 80 84 81

Assuming that a linear association exists, how are the data correlated? Determine the exact linear relationship between study time and test scores and use it to estimate the predicted test score for a student who studies 12 hours.

3.) Directions: Read the following problem. In your drop box submission, please include the following information:
a. Determine if the problem is either a test of hypothesis, a confidence interval or something else and specify the 'key words' found in the problem that demonstrate your choice.
b. Determine the procedure name and parameters involved for each problem (use the Stat 200 Formulas and Techniques Summary document.) Specify the 'key words' found in the problem that lead you to this choice.
c. If the problem is a hypothesis test, indicated if it has a lower-tail, upper-tail, or two-tail alternative hypothesis as well as the test statistic formula for the test. If the problem is a confidence interval then indicate the formula used for the margin of error.
You DO NOT need to do the ACTUAL test of hypothesis, confidence interval, etc. in order to submit in the dropbox, just answer parts a, b, and c for the problem.

A company sells a strong commercial floor cleaner and claims that the flashpoint (the lowest temperature at which the vapor of a combustible liquid can be ignited in air) exceeds 200ºF. A random sample of cleaner was obtained and the flashpoint of each was measured. The sample mean was 198.2ºF and the sample standard deviation was 10ºF. At the 1% significance level, is there sufficient evidence to support the companies claim?

4.) Directions: Read the following problem. In your drop box submission, please include the following information:
a. Determine if the problem is either a test of hypothesis, a confidence interval or something else and specify the 'key words' found in the problem that demonstrate your choice.
b. Determine the procedure name and parameters involved for each problem (use the Stat 200 Formulas and Techniques Summary document.) Specify the 'key words' found in the problem that lead you to this choice.
c. If the problem is a hypothesis test, indicated if it has a lower-tail, upper-tail, or two-tail alternative hypothesis as well as the test statistic formula for the test. If the problem is a confidence interval then indicate the formula used for the margin of error.
You DO NOT need to do the ACTUAL test of hypothesis, confidence interval, etc. in order to submit in the dropbox, just answer parts a, b, and c for the problem.

A survey of 500 households found that the family room was the primary television location in 415 homes. Is there evidence that the true population proportion of households having the family room as the primary television location is actually less than .85 at the 5% significance level?

Reference no: EM13534472

Questions Cloud

Evaluate the expression : The number of miles driven each day for a five-day trip. Let an ordered pair be given in the form (day, miles).
Find his rate on the side roads : During rush hour, Fernando can drive 20 miles using the side roads in the same time that it takes to travel 15 miles on the freeway.  If Fernando's rate on the side roads is 9 mi/h faster than his rate on the freeway, find his rate on the side roads.
Describe an algorithm you could use that would output each : Given a list containing Province, CustomerName and SalesValue (sorted by Province and CustomerName), describe an algorithm you could use that would output each CustomerName and SalesValue with the total SalesValue per Province.
Find a given person telephone number : Given an alphabetically sorted list of 500,000 people’s names and telephone numbers, describe an algorithm that you could implement that would allow you to find a given person’s telephone number in the shortest amount of time.
Simple linear regression : Simple Linear Regression
The warren area regional transit authority : WARTA, the Warren Area Regional Transit Authority
Mutually exclusive then these two events will be independent : Mutually exclusive then these two events will be independent
Grade point averages : Calculate the mean, median of the following grade point averages
Why is the central limit theorem used : Why is the Central Limit Theorem used?


Write a Review

Basic Statistics Questions & Answers

  Difference in the mean credit card debt

Researchers wanted to determine if there is a difference in the mean credit card debt owed by males versus females. Using α = 0.10, is there a difference. (Show all six steps of hypothesis testing.)

  Enough evidence to support the claim

This test result prompts a state administrator to declare that the mean score for the sate's eighth graders on the examination is more than 275. at a=.04, is there enough evidence to support the claim.

  Determining sample size requirements

Determine sample size requirements. When the results are available you would like your margin of error to be plus or minus 5% for the satisfaction measure and plus or minus $1.50 for the spending value.

  Find probability that coin gets designated as being weighted

Fip it 20 times and if it lands heads 17 times or more then the referee will designate that coin as the weighted one. a) If the coin the referee selects is fair, what is the probability that it gets designated as being weighted?

  Testing single mean and testing difference

Explain the difference between testing a single mean and testing the difference between two means. What two assumptions must be met when one is using z test to test differences between two means?

  Information of correlation and regression

Archaeopteryx is an extinct beast having feathers like a bird but teeth and a long bony tail like a reptile. Five fossil specimens have preserved both the femur and humerus bones. The measurements (in centimeters) for each bone are given below.

  Calculate the standard error of the mean

As part of study of the development of the thymus gland, researchers weighed the glands of five chick embryos after 14 days of incubation. Calculate the standard error of the mean.

  How formula requires knowing population standard deviation

There is not enough information to answer the question, the formula requires knowing the population standard deviation for Mark and Debbie, and we were not given this information.

  Estimating probability values based on discrete distribution

There are two telephone lines A and B. Let E1 be the event that line A is being used and E2 be the event that line B is being used

  Chi-square non-parametric testing concept

Provide the Null and Alternative Hypotheses testing statements for the following statement: There are only 24 hours in a day. What statistical tool would you now utilize to improve your business? Explain the Chi-Square Non-Parametric Testing Concept.

  1 one single number digit is selected randomly a list the

1. one single number digit is selected randomly a. list the sample space. b. what is the probability of each event?

  What is the value of k for systematic sampling

These employees has been taken by systematic sampling, what is the value of k? The researcher would start the sample selection between what two values?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd