Build multiple regression model using specified predictors

Assignment Help Basic Statistics
Reference no: EM132342267

Assignment - The assignment should be done using R studio and should be sent to be solved in .rmd format.

Instructions: For Project, you will use your linear regression tools to analyze data and build a report. To do this, you will generate a full data analysis workflow from articulating a research question to loading the data to exploring and visualizing your variables to building a model with diagnostics and possible transformations to interpreting your final results. Think of it as a data analysis story, like a short research paper you might submit to a scientific journal for publication, or like a data report you might submit to a client you're consulting for.

Your data analysis will be written up in R-Markdown on the departmental shimmer server, similarly to how you have been doing homework's and labs, but your project should be structured and presented like a report you would submit to a client or an academic journal (with full sentences and with written flow).

Some sections (Introduction, Discussion) only require written text, while others (EDA, Modeling, Prediction) will be a combination of code chunks and written text. All the code you need can be found in your homework's and labs and will just require editing; there is no need to write brand new code.

An example project is posted. There is a template markdown file you can download to your computer and then upload and open in R-Studio on the shimmer server. It is recommended that as you start working, you re-name the file something else (like maybe 202proj1ver1.Rmd or something), to avoid accidentally over-writing your project with the original blank template. Also, as you work, it is similarly a good idea to re-name the versions after a major revision. Each student will select from one of four project topics. Each project topic has a research scenario with variable description and a data file. In each research scenario, there are specific research questions/hypotheses that identify the response variable and the three predictor variables.

You will build a multiple regression model using the specified predictors to model the specified response.

Note: These are real datasets, and hence your regression diagnostics/p-values/R2 values/etcprobably won't be very good. The goal isn't a 'perfect' model or a very significant p-value, the goal is to explain your choices, demonstrate understanding of the techniques, and produce as reasonable a model you can (within the constraints of the assignment and your skills up to this point).

THE TOPIC IS: Court Cases: Predicting the amount awarded to the plaintiff.

Your report should have the following 5-section structure:

  • Introduction: Motivate why the topic is interesting or important, and what your overall research question is. The introduction should not have methodological discussion, it should just be the place to catch the reader's interest. (This is a brief section that only requires text. Use complete sentences).
  • Exploratory Data Analysis: Describe your data set. Indicate where the data came from and how it was collected (if known), as well as what the variables are (with units); Use written text along with numerical summaries and graphs. Perform univariate and bivariate EDA. Remember that your EDA should support your modeling section and results.
  • Modeling: Use your linear regression tools to build a model predicting your response variable. Use diagnostics to check your model assumptions; use variable transformations if needed and appropriate/justifiable. Remember to properly justify your choices and actions. Don't show the output for all the models you try, only focus on your final chosen model and justify why you ended up choosing that particular model (what other models did you try, and why did you end up selecting the final one as most appropriate?). This section should be a combination of code, diagnostic graphs, output, and written text.
  • Prediction: Calculating specific prediction specified in each scenario. Note: If you include transformations in your model, then to keep things simple you can compute and interpret the prediction in terms of the transformed variables. Also for simplicity, you can compute the prediction "by hand" (in a code chunk, using R like a calculator) just plugging in the values. Note also, you might decide that your best model doesn't use all the explanatory variables, in which case specified values of those variables won't play a role in your prediction. (This section should be a combination of code, output, and written text.)
  • Discussion: Short section where you give your overall conclusionsin terms of the problem of interest. Also, it is important to critique your own analysis - what limitations does your model have, and why? (Issues with diagnostics that should be especially pointed out? Issues with the data that should be mentioned? Other issues that should be mentioned?) Finally, projects should briefly discuss future directions the work can go. (What else would you have liked to do?) (This is a brief section that only requires text. Use complete sentences).

Attachment:- Statistics Assignment Files.rar

Reference no: EM132342267

Questions Cloud

Highlight the primary objective of paper : Please see below research plan written by a student. After? you've read the research? plan, answer the following questions
Post description of potential implications for socialization : Post a description of potential implications for socialization that stem from discrepancies between television characters and reality.
What is the role of intuition in decision making : Should managers use more objective or subjective intuition techniques when making decisions? Please explain your answer.
Research and writing process : 1. What are the key elements that should be considered during research and writing process.
Build multiple regression model using specified predictors : Court Cases: Predicting the amount awarded to the plaintiff. Build a multiple regression model using the specified predictors to model the specified response
What are the different social structures : What are the different social structures and how does one plan an action plan for social networking.
Is memory retention information specific : Why do you think that people have a tendency to recall some information better then other types of information? For example, some people are able to rattle.
Maintaining continuous improvement within an organisation : Q1. Explain what a quality system is and its role in maintaining continuous improvement within an organisation.
Why do you think that some people have better memories : Why do you think that some people have better memories than others? Is this a result of intelligence, practice, genetics, a combination, or something else?

Reviews

len2342267

7/19/2019 10:20:17 PM

Grading Rubric: INTRO: 10 pts EDA: 25 pts, as follows: EDA on Y: 3 pts EDA on each X: 9 pts total EDA on relationship between Y and each X: 9 pts total EDA other (depends on the dataset and types of variables): 4 pts MODELING: 40 pts, as follows: residual plot: 4 pts discuss residual plot: 6 pts qqplot: 4 pts discuss qqplot: 3 pts discuss linearity assumption between Y and Xs: 4 pts discuss interactions: 3 pts discuss multicollinearity: 3 pts other valid reasons for choice of model (depends on your dataset and what you tried): 4 pts showing final model summary: 5 pts discussing significance: 4 pts PREDICTION: 15 pts CONCLUSION: 10 pts.

len2342267

7/19/2019 10:20:11 PM

Advice: Save enough time to do the writing. Don't spend the whole week coding and trying models. Writing takes time, and we can only grade what you submit. Keep a balance between complexity and simplicity/interpretability. When working with real data, there isn't such a thing as a perfect model, but there can be useful ones. It's very unlikely that a model will fit your data perfectly; these datasets aren't as ‘nice' as those in homeworks/labs/lectures/example-project, and your regression diagnostics/p-values/R2 values/etc won't necessarily be very good. Transformations can be considered, but you should balance the tradeoff between making things look maybe only a little better versus more complicated interpretations. Always justify your actions regardless.

len2342267

7/19/2019 10:20:04 PM

Read your knitted report before you submit. Be sure that after you knitted your final version, you read the full report. Check if your conclusions match the presented model, and that you didn't accidentally use a discarded model for your conclusions. Check if the text and figures are all there and there is a nice flow when reading it. Write as if it is a report you would submit to a client or an academic journal, not like a homework or class assignment or a snapchat to a friend: Your title should be something interesting and meaningful that captures the reader's attention and has something to do with the topic and the work (not "my project" or something similar that you might title a piece of course work).

len2342267

7/19/2019 10:19:58 PM

The motivation you give in the introduction should suggest why the topic is of interest in the larger world and to the reader, not just to you personally. It is common in academic writing to use third person ("we", not "I"). The "we" typically implies the author and the readers (as if you are leading the readers together with you through the discovery process). Be professional in your writing; don't use emoticons, memes, online shorthand, or colloquialisms. This is practice for professional writing. [On the other hand, "professional" doesn't mean "artificially and unnecessarily complicated." Be clear and concise.]

len2342267

7/19/2019 10:19:53 PM

The report should have sentences that introduce what you present, motivate how it is connected to the larger report, and then discuss what was shown. So for instance, don't just show a set of graphs, without some surrounding phrases or sentences connecting them to the "flow" of the paper.5.Be consistent in your writing style. Don't switch between past tense (like, "we did the following analysis") versus present tense (like, " we do the following analysis").

len2342267

7/19/2019 10:19:39 PM

Don't copy verbatim sentences, phrases, or title from our posted topic prompts. You can paraphrase but not copy. Also the motivation for the topic that we wrote in the posted prompts aren't the only possible motivation. You don't need to "echo" all of the code. In the markdown template formatting, we have set the "echo" to TRUE, which will make all your code chunks show up in your knitted document (it "echoes" the code chunks in the knitted document), since seeing your code can sometimes help us in grading. But in published research you generally would not show all the code (it might be in an appendix). And regardless, we don't want to see all the test code you may have produced and ended up not using, or all the discarded models you tried and decided not to use (unless you want to refer to them).

len2342267

7/19/2019 10:19:31 PM

If you want specific code chunks to not show up in the knitted document you can open that particular chunk with that will make the raw code not show up in your knitted document (but it would still get evaluated -- if you also want code to not get evaluated, you can either comment-out the lines of code inside the chunk [i.e., by putting the pound sign to the left of each line of code] or else add the argument eval=FALSE in the opening curly brackets of the code chunk). Don't copy/paste from other text editorsinto R. This could cause your document to not knit, if the text editor used different character encoding.

len2342267

7/19/2019 10:19:17 PM

Be cautious copy/pasting between text areas and code chunks. This could similarly cause your document to not knit, for instance if some text character isn't understood in code, or if some code symbol isn't understood by the math font editor in the text area. Put extra line breaks (i.e., hit "enter" a few times) between graphs and text as needed. If, when you knit the document, you find that graphs or text seem to be pushed off the sides of the page, it's probably because the knitter is interpreting some graphs and text as being on the same line. Extra line breaks between them will solve this. Let me know if you have any questions.

Write a Review

Basic Statistics Questions & Answers

  What fraction of all bags sold are underweight

What fraction of all bags sold are underweight?- Some of the chips are sold in bargain packs of 3 bags. What's the probability that none of the 3 is underweight?

  Bringing about change in japanese retailing

In your opinion, could the Japanese system ever look as streamlined as that of the U.S.? Why or why not?

  What is mean and median

What is mean and median? What does it have to do with center, spread, and shape of distributions?

  What is the value of the mystery score

1. If the mean of the original distribution of 10 scores is 4.8, what is the value of the mystery score? Explain how you calculated it.

  How large a sample is needed to estimate

How large a sample is needed to estimate µ with 95% confidence, so the margin of error is no greater than 0.5 g/dL?

  Forty percent of all high school graduates work during the

forty percent of all high school graduates work during the summer to earn money for college tuition for the upcoming

  What is the likelihood of the given occurrence

The local newspaper, the Corry Press, suggests discrimination in an editorial. What is the likelihood of this occurrence?

  Which distribution is followed by xbar what are its

1. consider a continuous random variable x with varx 4. a sample of 50 measurements of x is taken and xbar 68a.

  What aspects of banking does each agency regulate

What principal agencies are responsible for the regulation and supervision of commercial banks? What aspects of banking does each agency regulate?

  Calculate the t statistic assuming equal population variance

SOC 200 - Shapiro Fall 2015 - Calculate the standard error, the standard deviation of the sampling distribution, of the difference between these the mean number of hours studied by males and females and what is your research hypothesis and what is..

  Why is hypothesis testing important in research

There are strengths and weaknesses associated with the hypothesis testing procedure. Why is hypothesis testing important in research?

  Write the complete second-order model as a function

‘‘Sheepskin screening'' in recruiting. Economic research has long established evidence of a positive correlation between earnings and educational attainment.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd