Description of the data and the statistical analyses

Assignment Help Applied Statistics
Reference no: EM132673930

MAS223 Applied Statistics - Murdoch University

The MAS223/ICT513 project includes a thorough data analysis related to your choice of one of three research topics. For each topic, several primary research questions are broached. It is possible that, in some cases, you may not be able to answer the questions directly with the data provided, but you may be able to answer questions that are similar to the posed questions that still maintain the focus of the original research questions.

While this is a data analysis project and not a research project (i.e., you are not expected to do an in-depth investigation of the particular topic), you should imagine that this report is being written for publication, so the style of the report should reflect that degree of formality and not be written as an assignment. Additionally, if considering linear regression models, although it is important that you use diagnostic plots to determine an appropriate model, these do not need to be included in the report. Rather you can simply make mention of any transformations made to ensure greater compliance with the assumptions of linear regression. Finally, R code and summary output should not be pasted into the document, but instead relevant results should be presented in nicely formatted tables. R code is to be submitted in a separate script file and should replicate the results presented in your report.

To assist in your writing of the report, several reports following a similar format are presented under the "Projects" link on the unit webpage. Ad- ditionally, a Word document with a template for the report is presented to assist you in the writing of the report. Note that you are not required to use this template.

The report must not exceed 10 pages in length (font size must not be smaller than 12pt, margins and line spacing should not be adjusted), and it is recommended that the report include the following sections:

1. Introduction: Provides background to the research, clearly lays out the research questions of interest, and motivates why these questions are being investigated. (Note that this section does not need to be long.)

2. Methods and Analysis: Provides a description of the data and the statistical analyses that will be performed. If a linear regression, princi- pal component analysis, or linear discriminant analysis is to be carried out, this section should provide an explanation of and motivation for the variables that are included in the model. This section should also include descriptive statistics (statistics, tables, graphs) that are useful in describing the data and providing a glimpse of what you might ex- pect from your statistical analyses. A good deal of thought should go into your descriptive statistics, as these must clearly show some rele- vance to your questions of interest, and you must explain what you can derive from these.
3. Results: Provides a thorough description of the results of the analy- ses you described in the previous section. Include tables with relevant output. If analyses are carried out that involve the estimation of pa- rameters, this should include an interpretation of the parameters for the variables of interest. Any issues with significant violations of the requirements/assumptions needed to perform the analyses carried out must be addressed. Finally, remember that even if a result is sta- tistically significant, this does not always mean that it is practically significant. (This is why interpretation of "effects" matters for linear regression models. Does the anticipated change in the response variable have substantive implications?)
4. Discussion: Concisely summarises your conclusions to the research questions of interest as well as any supplementary analyses carried out. This section also should include a brief description of the limitations of your analyses as well as other research questions that may be worth exploring in response to any exploratory analyses you have carried out.

If in doubt, you should discuss with the unit coordinator.

1. Remotely Detecting At-Risk Sheep

The live export industry is a roughly $2 billion AUD per year industry that continues to grow with increasing demand from both the Middle East and Asia. Adverse effects to the health of livestock (particularly death) before and during sea voyages are not only a commercial concern but also an animal welfare concern for the live export industry. Prior to export, livestock are commonly transferred to feedlots in close proxim- ity to the port of departure and observed over a period of time to assess their suitability for live export. Given that pre-embarkation feedlots may contain thousands of livestock, effective means of monitoring the animals are important in better ensuring that animals unsuitable for live export are identified and as early as possible. Currently employed methods are time consuming and require frequent handling of sheep, which both causes stress to the sheep and increases their exposure to Salmonella and a variety of contact-based diseases.

For this project, you are asked to consider data produced as part of a study into the use of radio frequency identification devices (RFIDs) on sheep at a particular pre-embarkation feedlot. Previous studies have highlighted inappetence (i.e. loss of appetite) prior to export as a sig- nificant risk factor for death during the sea voyage, and the study in question examined the use of RFIDs on individual sheep and readers at feeding and water troughs to measure the length of time spent feeding and drinking. This could potentially be used to quickly identify inap- petent or sick sheep, allowing for their early removal from feedlots to address their welfare.

As part of the study, sheep were kept on the pre-embarkation feedlot from anywhere from 6 to 31 days. Prior to entering the feedlot, sheep were subjected to body condition scoring (BCS), weighed (WEIGHT), and assessed in terms of both their sex (SEX) and whether or not they were shorn (SHORN). For each day that a sheep was on the lot, a measurement of feeding time (FEED TIME) and drinking time (WATER TIME) was ob- tained using the RFID. Of the 8,206 sheep included in this study, 76 (or 0.93%) died at the feedlot prior to export, and the length of time on the feedlot (DIED AFTER DAYS) and cause of death (CAUSE) were recorded.

Although it is of interest to know how many of the sheep that were exported survived the sea voyage, information was only provided for sheep survival for their time on the feedlot and not for a live export voyage.
There are two main results that would be of interest to sheep handlers at pre-embarkation feedlots to address:
(a) Given that RFIDs allow for the easy collection of data for however many days sheep are on the feedlot sheep farmers are interested in how they might produce predictive models that include the first two to six days of data. Produce predictive models that min- imise loss based on an estimate of false negatives being 250 times more costly than false positives. You might consider exploring whether you can find an appropriate means to reduce the number of variables under consideration for modelling purposes without sacrificing predictive performance too much. The resulting mod- els should then be applied to the dataset to provide a comparison of how many sheep would be correctly classified as at-risk and not- at-risk and how many false positives and negatives would result, along with corresponding cost. The report should include enough details for future sheep to be classified as at-risk or not-at-risk based on the best performing model.
(b) The sheep arrive in groups. Consider in the ability of the RFID data to explain the difference between the groups. You also have the proportion of surviving sheep in each group, so use this in- formation to provide some hypotheses about the impact of back- ground of the sheep (group) on the behaviour of the sheep in the feeding lot and survival. You can treat this data as a pilot study and assume that the farmers are going to further research in this area.

A full list of variables available for analysis is provided in Table 1. Some things to note:
Two sheep died due to causes unrelated to inappetence or handling (one died due to "bloat" [i.e. overeating], and one died due to "trauma" [specifically trampling]) and should be excluded from analyses.
Body conditions score (BCS) is a measure related to body fat. Too little and too much body fat can both place a sheep at increased risk of death, so any relationship between BCS and risk of death is unlikely to be linear. If used, it may be helpful to consider a recode of this variable that considers "good" and "bad" BCS.
It is important to consider a range of prior probabilities of assign- ment to the at-risk and not-at-risk groups.
Note: There is little uniformity in terms of measurements taken on sheep prior to entering a pre-embarkation feedlot, so some herds will be weighed, subjected to body condition scoring, etc., whereas others will not. A model that incorporates such measurements may potentially be better at identifying at-risk sheep, but it would not be applicable to those sheep that do not have such measurements taken.

Variable                          Description

ID                              Unique sheep identifier

GROUP                      Group to which sheep belonged

SEX                            Sheep sex (0 = "Ram", 1 = "Wether" [i.e. castrated ram], 2 = "Ewe")

BCS                           Sheep body condition score (ranging from 0 [no body fat] to 5 [too fat]) at time of entry into feedlot. Body conditions scores in the range of 2.5-3.5 are optimal with values below 2 or greater than 4 signalling poten- tial negative health ramifications

WEIGHT Sheep weight (in kg) at time of entry into feedlot

SHORN                      Sheep was shorn prior to entry into feedlot? (0 = "Not shorn", 1 = "Shorn")

FEED TIME1 Feeding time (in seconds) on 1st day in feedlot

WATER TIME1   Time spent at watering trough (in seconds) on 1st day in feedlot

FEED TIME2 Feeding time (in seconds) on 2nd day in feedlot

WATER TIME2   Time spent at watering trough (in seconds) on 2nd day in feedlot

FEED TIME31             Feeding time (in seconds) on 31st day in feedlot

WATER TIME31         Time spent at watering trough (in seconds) on 31st day in feedlot

DEAD                        Did the sheep die during its stay at the feedlot? (0 = "No", 1 = "Yes")

DIED AFTER DAYS      Sheep died after <NUMBER OF DAYS> days

CAUSE                       Cause of death

Table 1: Descriptions of variables contained in the dataset Sheep.csv.

2. Herbagut

Herbagut?R is a polyherbal blend of 14 ingredients that includes herbal
extracts such as Murraya koenigii, Glycyrrhiza glabra, Piper longum, Alpinia galangal, Centella asiatica, Curcuma longa, and Zingiber offic- inale. In a 28-day, randomised, double-blind, placebo-controlled study on 50 Indian adults presenting with self-reported unsatisfactory bowel habits, its administration was associated with improvements in bowel movements, and reductions in abdominal pain, constipation, diarrhoea, indigestion, and reflux (Lopresti et al., 2018).

As a result of these promising findings another study was carried out with the purpose of examining the safety and efficacy of Herbagut R in a population of adults living in Australia experiencing self-reported digestive complaints. Eligible and consenting participants were ran-
domly assigned to one of two groups (Herbagut?R or placebo). All par-
ticipants were instructed to take two capsules (i.e., 800mg of Herbagut or placebo) 1 hour before bedtime with 250mls of water for 8 weeks.

At baseline and week 8, participants collected a stool sample at home after a morning bowel motion. Intestinal microbiota gene sequencing was undertaken by Australian Genome Research Facility (AGRF). The samples identify each organism. The organsims are classified at every taxonomic rank (kingdom, phylum, class, order, family, genus, species). All of the organisms are bacteria.

The data provided is from the gene sequencing and has been presented at the class level. Each of the columns relating to the microbiome data has the column name in the format kingdom class (i.e. Bacte- ria Actinobacteria). The phylum name has been removed from the column names for brevity. The values represent the quantity of organ- isms at the class level. Additionally a participant ID, time (before or after) and group (Herbagut R or placebo) has been provided.

The two main results of interest to the manufactures of Herbagut R are:

(a) Consider just the microbiome data use it to profile the microbiome at the class level. We are interested carrying out dimension reduc- tion and using the results of this reduction to better understand the drivers of variability and potential groups at the class level. If standardisation is used this should be justified in the methods along with other technical details.

(b) The patients are in two groups (Herbagut ?R and placebo). Consider in the ability of the microbiome data to explain the difference between the samples from participants who took the placebo and those who took the capsules Herbagut R . Also consider if the mi- crobiome data could be used to estimate which of the two groups the participant was from.

Variable                          Description

ID time                                           Participant ID and time point (B for be-

fore, A for after)

Bacteria Actinobacteria                 Measures of Actinobacteria from the mi-

crobiome study

. . .                                                   . . . (Other classes)

Bacteria Verrucomicrobiae           Measures of Verrucomicrobiae from the

microbiome study

Time                                               The time point (B for before, A for after)

Group                                            Herbagut or Placebo

Table 2: Descriptions of variables contained in the dataset herbagut class.csv.

3. Red Portuguese "Vinho Verde" wine.

Vinho Verde traditionally comes from Northern Portugal and can be lower in alcohol than some other varieties of wine. Vinho Verde is a method of producing wine that can result in red, white or rose wine. It is a young wine, typically being released three to six months after the grapes are harvested. It may be slightly sparkling due to the natural fermentation process or the addition of artificial carbonation.

Red Vinho Verde is harder to come by as the climate in Northern Portugal favours grapes that lead to white wine.The reds that are made tend to be dark, acidic and lower in alcohol than other reds.

During or after fermentation the yeast naturally produces some acetic acid, but usually this quantity is undetectable on the palette. However if the wine is exposed to oxygen then some of the ethanol (i.e. alcohol) will be converted to acetic acid by the bacteria which are also present in the wine. This is highly undesirable as too much acetic acid impact the taste of the wine and can turn wine into vinegar.

Testing for acetic acid can cost more than other testing procedures. Therefore it may be of interest to identify the presence of high levels of acetic acid from other cheaper chemical tests.
We have data from 1,263 wines from four subregions in Northern Por- tugal.

The research questions associated with this data are:
(a) Using the other chemical testing results but not the quality of the wine or region it is from, is it possible to identify which wines have higher levels of acetic acid? In answering this question you should consider the ability of each chemical test to predict the acetic acid levels noting that doing a suite of tests comes with an increased cost which may be comparable to the cost of testing the acetic acid levels. It is highly important that you determine a statistical analysis plan prior to executing the plan otherwise too many approaches may be considered.
(b) Using the chemical testing results quality of the wine but not region create some clusters of the wines. What chemical testing attributes does each group have? What is the range of quality of wine in each group? Are the groups at all representative of the region?

Variable                          Description

Fixed acidity                           Most acids involved with wine or fixed or non-volatile (do not evaporate readily)

Volatile acidity                       The amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste

Citric acid                               Found in small quantities, citric acid can add 'freshness' and flavour to wines

Residual sugar The amount of sugar remaining after fermentation stops, it's rare to find wines with less than 1 gram/litre and wines with greater than 45 grams/litre are considered sweetChlorides                              

The amount of salt in the wine

Free sulphur dioxide The free form of SO2 exists in equilibrium be-tween molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine

Total sulphur dioxide Amount of free and bound forms of S02; in low

concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine Density                                  

The density of water is close to that of water depending on the percent alcohol and sugar content

pH                                          Describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale

Sulphates                               A wine additive which can contribute to sul- phur dioxide gas (S02) levels

Alcohol                                   The percent alcohol content of the wine

Quality                                   A measure of the quality of the wine

Region                                    The sub region of Northern Portugal (A, B, C, D)

Table 3: Descriptions of variables contained in the dataset wineQualityReds.csv.

Attachment:- Applied Statistics.rar

Reference no: EM132673930

Questions Cloud

Explain why an auditor cannot offer absolute assurance : Nayan,How you write letter to Nayan explaining the concept of reasonable assurance, and how reasonable assurance is determined.
How does evaluation research fit into the general scheme : How does evaluation research fit into the general scheme of policy analysis? Using the NIJ program, what role can research have in public policy debates.
Discuss international strategy and strategic implementation : Individual Assignments are integrated to generate a Strategic Management Plan. Discuss Strategic Implementation. Discuss International Strategy.
Which alternative is desirable and why : Find Which alternative is desirable and why? The units could be scrapped for $1,000 or reworked for $2,000 and sold for $5,000.
Description of the data and the statistical analyses : Provides background to the research, clearly lays out the research questions of interest, and motivates why these questions are being investigated.
Explain the differences of the economic crimes : In a well-constructed, one page essay (500+ words), compare (explain the similarities) and contrast (explain the differences) of the economic crimes.
Explain the ethical problem liona in case and why : Explain the ethical problem in this case. Why is it a problem? Broomers Pty Ltd, one of the client of Ross and Associates, has not paid its audit fees.
What amount of income must Larry report : Larry has $25,000 of rental income from his separate property. If Dana and Larry file separate tax returns, what amount of income must Larry report
Define and examine police subculture : Write a 500 word APA-styled paper that addresses the following tasks: Define and examine police subculture. Cite a recent example of a dangerous police.

Reviews

len2673930

10/23/2020 3:06:18 AM

Hi, Here is the applied stat report that need to do. Can help me do the question 2 according to the example of report that in the uploaded files. thank you so much.

Write a Review

Applied Statistics Questions & Answers

  Gerald black of blackfly airline has an exclusive contract

Gerald Black of BlackFly Airline has an exclusive contract to run flights of a four-passenger aircraft to a remote mining center. His contract requires him to fly if there are any passengers wanting to make the trip. His fixed costs per day ar..

  A furniture company manufactures tables and chairs

A furniture company manufactures tables and chairs. Each table and chair must be made entirely out of oak or entirely out of pine. A total of 15,000 board feet of oak and 21,000 board feet of pine are available. A table requires either 17 board feet ..

  A manufacturer claims that the life span of its tires is

A manufacturer claims that the life span of its tires is 47,000 miles. You work for a consumer protection agency and you are testing these tires. Assume the life spans of the tires are normally distributed. You select 100 tires at random and test the..

  Identify the test you will apply to test the hypothesis

DOCTORAL STATISTICS ASSIGNMENT - Identify the test you will apply to test the hypothesis. Justify your choice. State your decision regarding the hypothesis

  An irs study

3. According to an IRS study, it takes a mean time of 290 minutes for taxpayers to prepare, copy and electronically transfer a 1040 tax form. The standard deviation of this distribution is 75 minutes. A consumer watchdog agency selects a random sampl..

  Calculate the process capability

Calculate the process capability (Cp and Cpk) for each hole and explain your results.  Estimate what sources of variation may or may not exist in order to make it in control

  Compute the value of the test statistic

Use Z or T test? And why? At α = 0.05, what is the rejection rule? Compute the value of the test statistic. What is the p-value. What is the hypothesis being tested in this problem? In the above ANOVA table, is the factor significant at the 5% level

  Compute the mean median first quartile and third quartile

A bank branch located in a residential area has developed an improved process for serving customers during noon to 1:00 P.M lunch period. The waiting time in minutes (defined as time the customer enters the line to time he or she reaches the teller w..

  Calculate the residuals for the model

Finding and plotting residuals. Consider the data on x and y shown in the table.- Fit the model E(y) = β0 + β1x to the data.- Calculate the residuals for the model.

  What two parameters determine its location and shape

What proportion of trucks can be expected to travel between 80,000 and 120,000 kilometres in the year and what percentage of trucks can be expected to travel either below 60,000 or above 140,000 kilometres in the year -How many kilometres will be tra..

  Statistics from cornell''s northeast regional climate center

Statistics from Cornell's Northeast Regional Climate Center indicate that Ithaca, NY, gets an average of 35.4" of rain each year with a standard deviation of 4.2". Assume the data is normally distributed.

  Write about describing results of the seasonal adjustment

Write about 3-5 sentences describing the results of the seasonal adjustment. Pay particular attention to the scales of the graphs in making your interpretation.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd