What are assumptions made for linear regression model

Assignment Help Other Subject
Reference no: EM133284354

Quantitative Data Analysis

Assessment:
To assess student's understanding of the quantitative data analysis methods that underpin data science.

Learning Outcome 1: Demonstrate a practical understanding of core quantitative data analysis methods and data visualisation in data science application and research. (K, S)

Learning Outcome 2: Demonstrate skills in implementing these methods on real heterogeneous data using a software package and in critically evaluating and interpreting the results (S).

Learning Outcome 3: Critically reflect on data visualisation and the ability of various methods and techniques to effectively present value and insight. Evaluate the strength and the weaknesses of quantitative analysis methods alongside an understanding of how and when to use or combine methods. (C)

Data for this assingment have been randomised for each student. Each dataset is in the form data[num].csv where [num] is your student number. For example, if your student number is 12345678, the file will be called data12345678.csv.

Question 1. Petrol is manufactured using two test manufacturing processes, and it is desired to test whether the processes produce petrol with different specific energy. Data is provided in the file petrol[num].csv, which consists of a list of 20 samples taken at random from each process, and the specific energy (in MJ/kg) measured in the sample.
(a) Produce a boxplot to show the difference in specific enegy between the two processes. Comment on your boxplot.
(b) Perform a formal hypothesis test at the 5% level of significance for the hy- pothesis that the mean specific energy level differs for the two processes. State clearly your hypotheses, why you chose the particular test you have, and conclusions.
(c) If instead we wish to test whether the mean specific energy for process 1 is higher than that for process 2, how would our conclusions for part b) change?
(d) Find a 95% confidence interval for the difference in the mean specific energy levels for the two processes. Interpret carefully what this confidence interval means in this context.
(e) Does the the confidence interval in part d support your hypothesis test in part b? Justify your answer.

Question 2. An analyst for a cafeteria chain wishes to investigate the relationship between the number of self-service coffee dispensers in a cafeteria and sales of coffee. Four- teen cafeterias that are similar in their volume of business, type of clientele and location are chosen for the experiment. The number of dispensers varies from zero (coffee is only dispensed by serving staff) to six and is assigned randomly to each cafeteria. The results were as follows; sales are measured in hundreds of gallons of coffee sold. The results are in the sales.csv file
(a) Make an appropriate plot of the data, and fit a simple linear regression model to these data. Write down your model and your fitted values.
(b) What are the assumptions made for the simple linear regression model? Pro- duce residual plots for this model, and hence or otherwise check the assump- tions of the model.
(c) Suggest a possible improvement to the model based on what you have found and explain why you think it will be better.
(d) Fit your improved model and check the model assumptions. Use your model to predict the average volume of sales for a new cafe which opens with 5 dis- pensers.
(e) You are asked to write a brief summary of your findings for the marketing manager of the company. In non statistical language, briefly interpret your findings.

Question 3. A dataset on red wine quality is included as part of a dataset for modelling wine quality based on Physicochemical tests.
It is proposed that the quality of wine can be determined based on the follow- ing variables: Residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol.
All the wines are given a subjective mark of quality by the highly respected Uxbridge University Wine Board. It is of interest to the Wine Board to determine which factors are likely to be associated with a better quality wine.
(a) Produce appropriate plots and output to produce some qualitative and visual summary of the dataset, demonstrating the relationship between the ex- planatory variables and the quality. You should include:
• appropriate graphs labelled correctly;
• summary output and statistics;
• brief comments on how this output shows how the chemical variables relate to each other and the quality variable.

(b) Using any of the methods you have met in the course for regression, find a suitable model to predict the quality of the wine using the other variables. You should include:
• appropriate method(s) for selecting the best multiple regression model;
• appropriate checks and plots to show that the regression model fits the data correctly, and how you deal with any violations of the model as- sumptions;
• comments on how you identify and deal with any unusual observations;
• models involving transformations of the response (e.g. logarithms) where appropriate.

(c) Write a short report for the Chairman of the Wine Board, who has no statistical knowledge, indicating which factors are most important for determining a quality wine. You should include reference to your final model.

Attachment:- Quantitative Data Analysis.rar

Reference no: EM133284354

Questions Cloud

Describe performance appraisal best practices. : What are fairness and accuracy issues involved in performance appraisal. Describe performance appraisal best practices
How would a concert promoter, who had sold tickets : How would a concert promoter, who had sold tickets and rented a hall, protected against a loss that would likely result from the termination
Congressional representatives : who your local congressional representatives are for the House of Representatives.
Discuss the original purpose of the hawthorne studies : Discuss the original purpose of the Hawthorne studies, as opposed to what they ended up showing
What are assumptions made for linear regression model : MA5636 Quantitative Data Analysis, Brunel University London What are the assumptions made for the simple linear regression model
local congressional representatives : Who your local congressional representatives are for the House of Representatives
What one do you believe that you can utilize in your career : Watch the following Ted Talk: Know your worth, then ask for it, by Casey Brown. May, 2015. What one do you believe that you can utilize in your career
Current background of coastal erosion in philippines : What is the current background of coastal erosion in the Philippines? Why is it important that the government takes action on this problem?
How you would provide feedback to team members to encourage : Outline how you would provide feedback to team members to encourage, value and reward individuals/ team efforts for their contribution to make workplace safer

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd