ICT513 Data Analytics Assignment

Assignment Help Applied Statistics
Reference no: EM132636281

ICT513 Data Analytics - Murdoch University

Question 1.

(a) Explore the data and present some appropriate descriptive statis- tics and graphical displays for each variable (excluding mother identifier). Also explore the relationship between milk production and infant gender. Provide some comments on your explorations including on the data quality (presence of extreme value, missing values etc).

(b) Using simple linear regression, assess whether infant gender is as- sociated with daily milk production. Estimate the difference be- tween male and female infants. Support your response to the research question ‘Is infant gender associated with daily milk pro- duction?' with an appropriate confidence interval.

(c) Propose an appropriate multiple linear regression model to assess the relationship between daily milk production and infant gender that allows for possible cofounders. Clearly explain why you chose each of the explanatory variables included in your multiple linear regression model. Note fitting is carried out in part e, here present the notation.

(d) Assess the variables you have chosen as explanatory variables for collinearity in part (c). (Note: You will need to search for an appropriate R function to assess collinearity based on the measure discussed in lecture. You may find several options are available, some of which are much easier to use than others. You should report the function you have used as well as the R package it is found in.) Should any of your proposed explanatory variables be removed?

(e) Based on parts (c) and (d), fit an appropriate multiple linear re- gression model to assess the relationship between milk production and infant gender. Provide an interpretation of the model sup- ported with confidence intervals

Question 2.
Consider the task of attempting to predict daily milk production

(a) Consider candidate linear models where daily milk production is the response variable and predictors are given by:
• Baby gender
• Birth weight
• Maternal body mass index
• Maternal health
Using 100 bootstrap replicates and 100 repetitions of ten-fold cross-validation, produce bootstrap .632+ method and cross-validation estimators of MSE and RMSE for the four candidate models given
by appropriate simple linear regressions of milk production on each of the identified predictors, and present a table of these estimators side-by-side.

(b) Using the most important predictor identified in the previous part, consider all possible models that include this predictor as well as some combination of the other three predictors. (In other words, the candidate models considered for this part will all include the best predictor, and they will consider all possible combinations of this predictor with one or more of the other three predictors.) For these seven candidate models as well as the best model identified in the previous part, which is the best for prediction purposes as based on cross validation? (For this part, use 100 repetitions of ten-fold cross-validation. You do not need to consider bootstrap prediction.) (5 marks)

(c) Describe some enhancement to this analysis (you can even sug- gest further data to collect, other suitable research questions or techniques you think would be suitably applied in this area of research). (4 marks)

Question 3.
In your own words describe the .632+ method and an advantage of using this method.

Question 4.
Report presentation marks
These marks are allocated based on:
• structure, clarity, and tidiness of presented solutions/answers,
• correctness in spelling and grammar, and 5.
Coding marks
When submitted, this script file should have a name given by Assign- ment 2 SURNAME.R, where SURNAME is replaced by your surname. Your R script will be marked based on:

Readability of code: This includes the use of informative com- menting to make it clear what blocks of code are meant to do, descriptive variable names, and appropriate use of spacing to sep- arate blocks of code meant to perform different functions.
Accuracy of code: This includes the correct specification of func- tions to produce the results reported in your assignment and whether I am able to run your entire script file without producing any errors. It is important that you verify that your code runs error-free from start to finish before submitting.

Efficiency: This includes writing a script that uses minimal lines of code, is easily adapted to new datasets or slight modifications to the existing dataset, and runs quickly.

Attachment:- Data Analytics.rar

Reference no: EM132636281

Questions Cloud

How did find the fixed cost price : To begin the breakeven analysis by unit we can assume the fixed costs, How did find the fixed cost price, variable cost, and avg price per unit cost?
How the time value of money has an impact on the potential : How the time value of money has an impact on the potential investment returns and retirement savings of participants in both the Defined Benefit Plan
Explain the methods to evaluate public health programs : Select methods to evaluate public health programs. Advocate for political, social, and economic policies and programs that will improve health in diverse.
What will pharmacy technician have to do to become relicense : What agency would regulate Schedule II prescriptions in their state? What will this pharmacy technician have to do to become relicensed?
ICT513 Data Analytics Assignment : ICT513 Data Analytics Assignment Help and Solution, Murdoch University - Assessment Writing Service - Explore the data and present some appropriate descriptive
Analyze the effects of the events on the accounting equation : Rebecca Simpson opened a medical practice. Analyze the effects of these events on the accounting equation of the medical practice of Rebecca Simpson.
Discuss the health care administration barrier or issue : Discuss the health care administration barrier or issue you want to address for your capstone project paper. Briefly explain the proposed change plan.
Explain the microsoft licensing for virtualized environments : Describe the organization's environment, and evaluate its preparedness for virtualization. Explain Microsoft (or another product) licensing for virtualized.
Calculate the bond price assume face value : semiannually compounded yield to maturity of 3.00%. Recognizing that coupons are paid semiannually, calculate the bond's price. Assume face value is $1,000

Reviews

Write a Review

Applied Statistics Questions & Answers

  What might be the implications for social change

Explain how you might conceive variables to be used to answer a social change question. What might be the implications for social change?

  Report the average starting salary for recent graduates

According to a recent newspaper report the average starting salary for recent graduates in Electrical Engineering is at least $62,450.The Placement Director at Supreme State University would like to test the accuracy of the newspaper report. Th..

  Find the probability of matching all winning numbers

Use the hyper geometric formula to find the probability of matching all 5 winning numbers. The lottery commission also pays if a contestant matches 3 or 4 of the 5 winning numbers. Hint: Divide the 32 numbers into two groups, winning numbers and non ..

  How can graphics and/or statistics

How can graphics and/or statistics be used to misrepresent data? Where have you seen this done?

  What is the probability of failing to detect the shift

What is the probability that there will be no false alarms in the next 15 samples taken - What is the probability that there will be at least one false alarm

  Social sciences undergraduates had a preferred method

A university librarian wanted to determine if social sciences undergraduates had a preferred method for searching the Internet for sources for academic papers. Over a 5-day period she randomly surveyed 100 students who came into the CSUDH library and..

  Develop a multiple regression model

BSB123 Data Analysis - Determine if being indigenous is a disadvantage in terms of birthweight

  What is the standard deviation of the time

STA101 - Statistics for Business Assignment. What is the standard deviation of the time it would take each of Bill and Ben to finish the job

  The mean salary of federal government employees

The mean salary of federal government employees on the General Schedule is $59,593. The average salary of 30 state employees who do similar work is $58,800 with population standard deviation = $1500. At the 0.01 level of significance, can it be concl..

  Compute the correlation and the regression of these

Compute the correlation, the regression of these. Next test if the correlation is statistically significant and interpret the results, and the scattergrams.

  What is the best point estimate for the population mean

What are confidence intervals? What is a point estimate? What is the best point estimate for the population mean? Explain. Why do we need confidence intervals?

  Find the average of the life of LED bulbs

MST-001: Foundation in Mathematics and Statistics Assignment - Find the average (median) of the life of LED bulbs with the help of ogives

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd