Produce a scatterplot of the data and overlay a contour plot

Assignment Help Applied Statistics

Reference no: EM132375108

Homework -

Answer all questions specified on the problem and include a discussion on how your results answered/addressed the question.

Submit your .rmd file with the knitted PDF (or knitted Word Document saved as a PDF). If you are having trouble with .rmd, let us know and we will help you, but both the .rmd and the PDF are required.

This file can be used as a skeleton document for your code/write up. Please follow the instructions found under Content for Formatting and Guidelines. No code should be in your PDF write-up unless stated otherwise.

Please do the following problems from the text book R Handbook and stated.

1. The galaxies data from MASS contains the velocities of 82 galaxies from six well-separated conic sections of space (Postman et al., 1986, Roeder, 1990). The data are intended to shed light on whether or not the observable universe contains superclusters of galaxies surrounded by large voids. The evidence for the existence of superclusters would be the multimodality of the distribution of velocities.

a) Construct histograms using the following functions:

-hist() and ggplot()+geom_histogram()

-truehist() and ggplot+geom_histogram() (pay attention to the y-axis!)

-qplot()

Comment on the shape and distribution of the variable based on the three plots. (Hint: Also play around with binning)

b) Create a new variable loggalaxies = log(galaxies). Construct histograms using the functions in part a) and comment on the shape and differences.

c) Construct kernel density estimates using two different choices of kernel functions and three choices of bandwidth (one that is too large and "oversmooths," one that is too small and "undersmooths," and one that appears appropriate.) Therefore you should have six different kernel density estimates plots. Discuss your results. You can use the log scale or original scale for the variable.

d) What is your conclusion about the possible existence of superclusterd of galaxies? How many superclusters (1,2, 3, . . . )?

e) How many clusters did it find? Did it match with your answer from (d) above? Report parameter estimates and BIC of the best model.

2. The birthdeathrates data from HSAUR3 gives the birth and death rates for 69 countries (from Hartigan, 1975).

a) Produce a scatterplot of the data and overlay a contour plot of the estimated bivariate density.

b) Does the plot give you any interesting insights into the possible structure of the data?

c) Construct the perspective plot (persp() in R, GGplot is not required for this question).

d) Model-based clustering (Mclust). Provide plot of the summary of your fit (BIC, classification, uncertainty, and density).

e) Discuss the results (structure of data, outliers, etc.). Write a discussion in the context of the problem.

3. A sex difference in the age of onset of schizophrenia was noted by Kraepelin (1919). Subsequent epidemiological studies of the disorder have consistently shown an earlier onset in men than in women. One model that has been suggested to explain this observed difference is known as the subtype model which postulates two types of schizophrenia, one characterized by early onset, typical symptoms and poor premorbid competence; and the other by late onset, atypical symptoms and good premorbid competence. The early onset type is assumed to be largely a disorder of men and the late onset largely a disorder of women. Fit finite mixutres of normal densities separately to the onset data for men and women given in the schizophrenia data from HSAUR3. See if you can produce some evidence for or against the subtype model.

Attachment:- Assignment Files.rar

Reference no: EM132375108

Questions Cloud

Minnesota judgment from being collected against medspa : Was this argument alone sufficient to prevent Minnesota judgment from being collected against MedSpa? Make arguments for both parties,

Implement to attract and retain top talent : What practices should firms such as Dewey & LeBoeuf implement to attract and retain top talent?

Warehouse operations responsible for storing holiday : You are in charge of warehouse operations responsible for storing holiday ornaments. Upon your work analysis, you found that you would be short of 300

Dominant image of change management : There are six images listed, and they are Director, Navigator, Caretaker, Coach, Interpreter, and Nurturer

Produce a scatterplot of the data and overlay a contour plot : STAT 601 Homework - Produce a scatterplot of the data and overlay a contour plot of the estimated bivariate density

Punishment alternative within corrections : CRJ316- Discuss the role of intermediate sanctions as a punishment alternative within corrections. evaluate the position taken by classmate regarding whether

Overview pertaining to the investigating of child abuse : His week's readings provides an overview pertaining to the investigating of child abuse. what you have learned about investigating alleged child abuse.

Case automatic weapon is defined as firearm : LSTD301- In this case an automatic weapon is defined as firearm that continuously fires so long as user presses the trigger and there is ammunition in gun

Provide informationally adequate descriptive statistics : Provide informationally adequate descriptive statistics. Describe the results of your evaluation of the assumption of multivariate normality

Reviews

len2375108

9/23/2019 9:45:46 PM

Answer all questions specified on the problem and include a discussion on how your results answered/addressed the question. Submit your .rmd file with the knitted PDF (or knitted Word Document saved as a PDF). If you are having trouble with .rmd, let us know and we will help you, but both the .rmd and the PDF are required. This file can be used as a skeleton document for your code/write up. Please follow the instructions found under Content for Formatting and Guidelines. No code should be in your PDF write-up unless stated otherwise.

9/23/2019 9:45:40 PM

For any question asking for plots/graphs, please do as the question asks as well as do the same but using the respective commands in the GGPLOT2 library. (So if the question asks for one plot, your results should have two plots. One produced using the given R-function and one produced from the GGPLOT2 equivalent). This doesn’t apply to questions that don’t specifically ask for a plot, however I still would encourage you to produce both.

Write a Review

Required(*) Message

User Account

All Pages