Provide the output showing the eigen values and interpret

Assignment Help Mathematics
Reference no: EM132310935

Assignment

Question 1:

The data file ‘iris.txt' contains data measuring four features of iris flowers. One hundred plants across three species were measured for the variables Sepal Length, Sepal Width, Petal Length and Petal Width. Provide R code, output and written interpretation for all analyses.

(a) Produce and interpret pair-wise scatter plots for all four of the flower features variables, distinguishing between species using colour.
(b) Training and test sets should be used with a 60/40 split and a seed value of 1125 in your code. Use the table function in R to provide the number of flowers in each species for both the training and test sets that you have constructed.
(c) How would increasing the training/split to 80/20 potentially affect your results? (do not perform this analysis)
(d) Perform a DFA using the training set. Explain why there are only two DFs calculated. Provide output, definition and interpretation (in context of the data and method) for:
• the prior probabilities
• the trace values
• the weightings on LD1 and LD2
(e) Based on the DFA, predict species membership for the test set and create and interpret a table showing observed vs predicted for the test set. Create an x-y plot of the two DFs grouped by the original species labels and another by the predicted species labels. Indicate on the 2nd plot the flowers that were misclassified.

Question 2:

The data file ‘butterflies.txt' contains the butterfly data from Table 1.3 in the text book by Manly (2005). Sixteen colonies of butterflies were sampled and the data set contains information related to 4 environmental variables and 4 gene frequencies. As described in Manly (2005) the frequencies for the 0.40 and 0.60 genes have been combined to form a new variable labelled ‘0.4+0.6'. Assume MVN.

(a) Based on standardised variables produce and comment on 3 separate pairwise correlation matrices: 1) correlation between the 4 gene frequency variables; 2) correlation between the 4 environmental variables; 3) correlation between the 4 gene frequency variables and the 4 environmental variables. Do these correlation matrices suggest that canonical correlation would be an appropriate form of analysis and why?

(b) Perform a canonical correlation on this data set for the standardised variables X1 to X4 (Alt, annualprec, maxtemp, mintemp) and Y1 to Y4 (X0.4_0.6, X0.80, X1.00, X1.16) as defined on page 149 of Manly (2005). Provide appropriate output, definitions and interpretations for:
• canonical correlations (also explain why canonical correlations become successively weaker but do not add up to one).
• chi-square test of significance and Rao's F approximation significance test
• redundancy coefficients for the variance in the Y set of variables explained by the variance in the X set.
[Note: ‘appropriate' requires you to select the appropriate parts of the output from your analysis to address each dot-point - do not include all R output].

(c) Provide the equations that describe the first canonical function using your analysis solution. Interpret the canonical loadings and the value of the analysis overall.

(d) Provide the output showing the eigen values and interpret. Explain the relationship between eigen values and canonical correlations.

(e) Why is canonical correlation an appropriate technique for this analysis and not multiple regression or MANOVA?

(f) What are the limitations associated with canonical correlation analysis?

Question 3:

Use the ‘iris_sub.txt' data file (Caution: not the same file as used in Question 1). Provide R code, output and written interpretation for all analyses.

(a) In R produce a table of sample sizes per species in the dataset. Comment.

(b) Standardise the data and perform a cluster analysis based on Euclidian distances and Nearest-Neighbour linkage. Plot a dendrogram based on this cluster analysis (label the tips of the dendrogram branches by species). Indicate on the dendrogram where the tree should be cut to produce 3 clusters and describe the cluster membership.

(c) Repeat the analysis in part (b) using Euclidian distance and group average linkage, and then again using Manhattan distance and group average linkage. For the clustering based on Manhattan produce the cutree group membership for 3 clusters. Discuss which cluster analysis produces the ‘best' result, specifically commenting on: the choice of 3 clusters and the membership of each cluster given the true species designation.

Question 4:

Use the same ‘iris_sub.txt' data file as in Question 4. Provide R code, output and written interpretation for all analyses.

(a) Produce a metric 2D MDS ordination plot based on Euclidian distances for the four measurement variables (SEPALLEN, SEPALWID, PETALLEN and PETALWID) and using the SPECIES number as labels in the ordination space. Include in your interpretation of the MDS ordination an interpretation of the Goodness of Fit output from the MDS analysis. What happens to the GOF if another dimension is added to the analysis?

(b) Reproduce the ordination plot with the species numbers also coloured by species i.e red for species 1, blue for species 2 and dark green for species 3. Hint: Use colors() to find names of colours to use in code

(c) Reproduce the ordination plot with the species identified by 3 different symbols of your choice. Include a legend on your plot. Hint: Use pch? to find codes for symbols

(d) Compare the metric MDS ordination to the cluster analyses performed in Question 3. Comment on the similarities and differences between the methods and compare the results.

(e) Is it possible to determine which variables are most influential on the x ordination axis? Explain.

(f) Rerun the ordination with row labels (plant id) as the labels for objects in the ordination space. Briefly describe the association between Plants 27, 39, 51 and 75.

Question 5
Write 100 to 300 words explaining whether any of these forms of analysis have helped your understanding of the data. Do not restate results.

Attachment:- Multivariate Analysis for High-Dimensional Data.zip

Verified Expert

This paper is about classification and dimensions reduction methods.It covers, both supervised and unsupervised classification methods with so many methods available the researcher chose to discuss the following four methods; linear discriminant Analysis, Canonical correlation, cluster analysis, and ordination plots by way of multidimensional scaling.The paper is based on r programming t is a prerequisite to the users of this paper.

Reference no: EM132310935

Questions Cloud

Compute the summaries for the variable age in years : Plot the distributions of diastolic blood pressure (dbp) and systolic blood pressure (sbp) by sex and death. Use only one graph to do this.
Get rid of performance reviews : Based on your own experience, what is the most serious weakness in the performance review process? How can it be changed?
Ethics of milo company business essay : How to write Business Ethics of Milo Company Business Essay and need to focusing on Marketing and Ethics.
Some of the reasons thinking ethically : What are some of the reasons "thinking ethically" means different things to different people?
Provide the output showing the eigen values and interpret : STA8005 - Multivariate Analysis for High-Dimensional Data - University of Southern Queensland - Compare the metric MDS ordination to the cluster analyses
Briefly summarize the foundational approaches of behaviorism : Briefly summarize the foundational approaches of behaviorism and social learning theories and the early theorists associated with them.
Explaining theoretical perspectives and empirical research : When conducting your research, consider the issues that arise within these different disciplines. In your analysis of each topic, explain the theoretical.
Are economic systems a form of moral philosophy : 1. Are Economic Systems a Form of Moral Philosophy? Explain.
What are some of the social and environmental considerations : What are some of the social and environmental considerations you would need to manage at a CAR launch festival to ensure it is considered to be a sustainable

Reviews

len2310935

5/23/2019 10:20:52 PM

? Please note that referencing text books and other resources is not the goal of this assessment. This work requires students to demonstrate their understanding of the analysis and interpretation, not provide quotes from resources. ? When interpreting output, you are expected to do so in context of the data and the method (i.e. ensure you comment on aspects of the method that affect your interpretation with the respect to the variables and sample). ? A maximum of 10 marks will be deducted from your total marks for poor presentation.

len2310935

5/23/2019 10:20:45 PM

? If you convert a Word document to pdf for submission check that all symbols, equations etc. have converted correctly, i.e., proof-read your work. ? If you do not use knitr to compile your submission, where asked to provide R code, paste relevant code within the assignment document and italicise (or otherwise highlight or distinguish from other content). Do not include code in an appendix. ? Do not include an appendix at all. Any work included in an appendix will not be marked.

len2310935

5/23/2019 10:20:38 PM

? Submit only one file in pdf format to the link on the Study Desk. ? Assume that your report will be read by someone familiar with the data sets but with limited statistical knowledge. Fully explain plots and when stating statistics or results explain what they mean statistically AND in context of the data. ? Presentation should be neat, consistent, spell-checked and proof read. All questions should be clearly labelled, and all answers should clearly and concisely address the questions.

Write a Review

Mathematics Questions & Answers

  Questions on ferris wheel

Prepare a Flexible Budget Gator Divers is a company that provides diving services such as underwater ship repairs to clients in the Tampa Bay area.

  Logistic map

This assignment has two question related to maths. Questions are related to bifurcation cascade and logistic map.

  Finding the probability of cards

This assignment has questions related to probabiltiy.

  Systems of ode

Find all the xed points, and study their stability and Draw the phase portrait of the system, as well as the graphs of the solutions in all relevant cases.

  Derive the boolean expression

Derive the Boolean Expression and construct the switching circuit for the truth table stated

  System of equations

Evaluate which equations are under-identified, just-identified, and over-identified.

  Linear programming problem

Linear programming problem consisting of only two constraints with one objective function.

  Find the natural domain

Find the natural domain of the given functions.

  Introduction to numerical methods

Compute the coecients of the polynomials using the term recurrence relation.

  Chart of the topological manifold

De?nition of smoothness of functions on a smooth manifold is chart independent and hence geometric.

  Mathematics in computing

Questions related on mathematics in computing.

  Complex problems

Complex problems

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd