Create an Inverse Response Plot

Assignment Help Other Subject

Reference no: EM132253825

Part A -

This first part is based on the "IPEDS" dataset, a national database of institutions of higher learning. The data you'll examine includes a random sample of four-year, non-profit institutions in 2016. The output you need to answer these questions is given after the last question of Part A. For these and all questions, please type your answers directly into this document.

IPEDS Data:

grad.rate: the percent of students who graduate within four years. Values are integers that represent percents. For example, 23 means the graduation rate is 23%.

sector: Indicates whether public or private school.

Average.loan: in dollars, the average student debt at graduation.

SAT.read.25p: the 25th percentile of the SAT reading score for enrolled students.

SAT.math.25p: the 25th percentile of the SAT math score for enrolled students.

SFR: student-faculty ratio. (Number of students per full-time faculty member).

1. According to the output, do public schools and private schools differ in mean graduation rates, controlling for the other variables? Give an answer ("yes" or "no") and explain how you reached this conclusion. Use a 5% significance level.

2. Why do the p-values for SFR differ in the anova and summary tables? Choose the best:

a) The anova table is testing the null hypothesis that the slope for SFR is 0, and the summary table is testing the null hypothesis that the slope for SFR is 1.

b) The anova table is testing the null hypothesis that the slope for SFR is 0 given that SAT.math.25p, SAT.reading.25p, and Average.loans are all in the model, while the summary table is testing that the slope is 0 given that SAT.math.25.p, SAT.reading.25p, Average.loans, and sector are in the model.

c) The anova table is based on the F statistic while the summary table is based on the t-statistic.

d) The anova table is testing the null hypothesis that the slope for SFR is 0 given that all of the other variables are included in the model, and the summary table is testing the null hypothesis that the slope for SFR is 0 given that no other variables are included in the model.

3. Which of the following is the best interpretation of the coefficient for sector? (Assume the model is valid.) Indicate the best choice.

a) Among all schools with similar loan amounts, similar SAT reading and Math 25th percentiles, and similar student-faculty ratios, the graduation rate at public universities is about 2.6 percentage points lower, on average, than at private universities.

b) The graduation rate at public universities is about 2.6 percentage points lower than at private universities.

c) The mean graduate rate at public universities is about 2.6 percentage points lower than at private universities.

4. What is the interpretation of a 5% significance level the context of testing whether public and private schools differ in mean graduation rates? Indicate the best interpretation from among these:

a) The probability that we will conclude that public and private schools differ in mean graduation rates when, in fact, they are the same, is 5%.

b) If public and private schools do not differ in mean graduation rates, then the probability of getting a test statistic as extreme or more extreme than 0.976 is 5%.

c) The probability that public and private schools differ in mean graduation rates is 5%

d) The probability that public and private schools have the same mean graduation rates is 5%.

5. Notice that the last line in the anova table has been removed. SYY = 355,246. Give the values to fill in the rest of the table:

Df:

Sum Sq:

Mean Sq

F value

PR(>F)

6. A politician sees this analysis and notes that the coefficient for student-faculty ratio is negative and statistically significant. He says "This analysis shows that if we lower student-faculty ratios, then graduation rates will increase." This interpretation is

a) Valid

b) Invalid

7. The p-value of 0.04710 for SFR is best interpreted as the probability that the null hypothesis is correct. Is this a valid statement?

a) Valid

b) Invalid

8. The p-value of 0.04710 for SFR is best interpreted as the probability the null hypothesis is wrong. Is this a valid statement?

a) Valid

b) Invalid

9. Suppose you have fit two different models to predict the salary of a worker in the U.S. based on a number of different predictor variables. Model 1 has R2 of 90% and the residual plot shows a trend. The other diagnostic plots look good. Model 2 has an R2 of 60% and all of the diagnostic plots look good. Which model should you use?

a) Model 1

b) Model 2

10. Explain your choice for (9).

PART B -

In this part, please upload the provided data set into R. The dataset is in "FinalData.csv"

You are expected to turn in a .R file that includes all commands you used to prepare your answers for Part B. (You need not include any calculations you may have performed for part A.)

The variable size is the sum of the variables ThoraxLength, ClawLength, and ClawHeight.

1. Fit a basic linear model using only size, Weight and Sex to predict pinching force. Do not do any transformations or higher order terms.

a) Write the equation of the model: Predicted_PinchingForce=

b) Comment on the model validity with respect to these three conditions. Type the word "is" or "isn't" and then give your reason.

Linear trend condition [is or isn't?] satisfied

Constant Variance condition [is or isn't?] satisfied

Normal distribution of errors condition [is or isn't?] satisfied

2. Create an Inverse Response Plot. What transformation of PinchingForce provides the lowest residual sums of squares?

3. What transformation of PinchingForce is suggested by the Box-Cox transformation?

4. Fit the model using the transformation for PinchingForce based on the Box-Cox power transform (using the "Rounded" power). Which model do you think is better, in terms of model validity: the "basic" model in question1 or this model? Explain.

5. At the midterm, we found that the pinching force for male crabs was greater than for female crabs. Explain why this is not the case with the current model. (Hint: note that male crabs tend to be bigger and heavier than females.)

6. Give the variance inflation factors for each variable for the transformed model from question 4.:

Sex

Weight

Size

7. What do these values for vif tell us in this context?

8. Perform best subsets regression, forward stepwise, and backward stepwise to develop the "best" model, using BIC as a criteria. Use your transformed version of PinchingForce. Include these predictors to start: Weight, ThoraxLength, Sex, ClawLength, ClawHeight, ClawWeight. Note that you may get three different models from each of these three approaches. Choose the one with the lowest BIC. Be sure to state the BIC value for your choice. Use this model to answer these questions:

a) Give the equation for the final model you chose:

b) BIC for final model:

c) Suppose we had just caught a coconut crab with these measurements:

ThoraxLength: 52

Weight: 615

Sex: Male

ClawLength: 67

ClawHeight: 26

ClawWeight: 34

Predict it's pinching force at a 95% level (give the appropriate interval)

9) Consider the output see in attached file:

a) What null and alternative hypotheses does the F-statistic test?

b) What do you conclude based on the p-value (using a significance level of 0.05)?

c) Extra Credit: What's going on here?

Attachment:- Assignment Files.rar

Reference no: EM132253825

Questions Cloud

Principles of goal-setting theory : Explain how you, in your role as chair, would use the principles of goal-setting theory to achieve your committee's goal.

Why is the given effort necessary : Organizations are struggling to reduce and right-size their information foot-print, using data governance techniques like data cleansing and de-duplication.

How can sportsking adjust its compensation : How can Sportsking adjust its compensation packages to address these problems?

Determine an organization schema using given data : This is where you organize and list each applicable privacy policy statement. These are the rules that govern your company's actions, and those of your staff.

Create an Inverse Response Plot : Create an Inverse Response Plot. What transformation of PinchingForce provides the lowest residual sums of squares

Simulate the starving problem that a process may suffer : CIS 657 - Principles of Operating Systems - Syracuse University - Explain how our version of xinu that runs on the virtual machine can achieve the functions

What are three or four things that you would view : What are three or four things that you would view as demanding your quality control attention in your small business?

Preparing call reports on time : Other problems with Brad include not preparing call reports on time, failing to show up at trade shows, and not attending sales training programs.

Type of patient record keeping policies : What must be considered when developing policies and procedures?

Reviews

len2253825

3/11/2019 10:57:27 PM

Please carefully read and answer all the questions in the ‘PartA,B.doc’ file. Also, You have to turn in two files: a Pdf file that contains answers for Part A, and a .R file that shows work and calculations for Part B. This first part is based on the "IPEDS" dataset, a national database of institutions of higher learning. The data you'll examine includes a random sample of four-year, non-profit institutions . In the second part, please upload the provided data set into R. The dataset is in "FinalData.csv" (attached).

Write a Review

Required(*) Message

User Account

All Pages