Explain your understanding of eigen vectors and eigen values

Assignment Help Basic Statistics
Reference no: EM132314017

Part A Objective:

The purpose of this project is to provide you with an opportunity to demonstrate an advanced level of synthesis, understanding and communication of the concepts, statistical methods and practical analyses within R that you have learnt throughout this course.

Please remember that STA8005 is a postgraduate level course which requires that students demonstrate an advanced level of knowledge, skills, reasoning and problem- solving. Also, this project is a significant assessment item worth 50% of your final grade. As such, you should expect to find it challenging and expect to spend considerable time working on it. I encourage you to start as soon as possible. You do not need to have completed all the course work and topics to make a start on becoming familiar with the data.

The Tasks:

Task 1: The client would like to know the number of rivers in the sample after cleaning. In addition, the number of rivers measured in each season, and each river size is required and some appropriate summary statistics/plots for each of the 8 chemical variables individually.

Action: Clean the data as you think necessary and then provide a frequency table of the number of rivers measured in each season, and then each river size. Determine appropriate summary statistics for each of the 8 chemical variables and the best way to present this information. Interpret interesting aspects of this data summary.

They would also like to know what the relationships are between the combination of river size & velocity based on the chemical variables. Which groups are most similar, and which are most different?

Action: First, create a new variable called ‘river_size_vel' with categories that are combinations of the 3 River_Size and 3 River_Vel categories and provide a frequency table of the number of rivers in each new category. To show the multivariate relationships among the categories of the new ‘river_size_vel' variable present the dendrogram and an MDS plot that best represent the relationships - as part of your interpretation explain what types of distances and clustering methods you have used and why.

Task 2: The client would like to know if there are significant differences among the four seasons in terms of average river health as indicated by all of the chemical and algae variables?

Action: Select the best method (from those covered in this course only) to explore this question, perform the analysis and interpret. Include in your answer appropriate p-values for all significance tests performed.

Task 3: The client would like to know if the season can be predicted based on all chemical and algae variables. The client is only interested in data related to Autumn, Spring and Summer?

Action: Select the best method (from those covered in this course only) to explore this question, perform the analysis, explain all relevant details of your process and interpret the results.

Instructions:

• You will submit ONE pdf file for your project which will include Part A of your report addressing Task 1, 2 and 3 and Part B. All sections of Part A should be clearly labelled. You will also submit ONE R script file with all analysis code for Part A clearly labelled and commented.
• You should not include any R code in your pdf report submission.
• For each of Task 1, 2 and 3 your report should be no more than 2-3 pages (i.e. no more than 9 pages total). Any additional pages for Part A will not be marked. This means you will need to be very concise and clear in the information you provide in your final report.
• Do not include an Appendix.
• Things to include for each of Task 1, 2 and 3:
o Was any data cleaning necessary (removal of cases with missing data, outliers, etc.)?
o Did you need to subset, rearrange or summarise the data in order to complete your analysis?
o What analysis was performed and what specific choices were made in the analysis process that would be necessary to know for the analysis to be repeated?
o What are the important aspects of the results to convey and did the analysis successfully address the aims of the company management?
o Are there any caveats/limitations you would place on the results or suggestions for future analysis?
o Do the results for each Task relate to previous Tasks?
• To help succinctly convey results you can include tables or figures of results within the page limit, but you must also explain them in text. You can also use dot-points to itemise presentation of information if that helps.
• Do not perform any transformations on any of the variables to help normalise them for any analysis. For the purpose of this exercise assume that the data meets univariate and MVN requirements.
• Your R script file should reference only one input data file - river.csv, provided to you on the Study Desk.
• All sub-setting and data manipulation must be done in R - do not change the data file river.csv before importing into R.
• Do not be misled by the fact that you will submit no more than 6 - 9 pages for Part A - these analyses will require a time-consuming trial-and-error approach in order for you to ensure all data has been cleaned, subset and/or restructured correctly for the analysis, and then time for you to consider all options and choose the final analyses that you think will address each task best.
• To correctly address the tasks, you will need to spend time ensuring you have the correct R code. You have been given examples of all R code needed to correctly clean, subset and/or reorganise your data and perform the necessary analyses. Remember that there are many helpful websites via google to help you problem solve R coding issues.

Part B:

Include your responses to these Part B questions at the end of your Part A pdf submission (i.e. only one pdf file to be submitted for this whole project assessment item)

Question 1:

Recreate and complete the table below by indicating which features are relevant to each method.

Feature

MANOVA

PCA

FA

()FA

(CA

CA

MDS

Eigen analysis

 

 

 

 

 

 

 

Distance matrix

 

 

 

 

 

 

 

Data/Dimension reduction

 

 

 

 

 

 

 

Classification

 

 

 

 

 

 

 

Can be used to Identify group structure/clusters

 

 

 

 

 

 

 

Need independent o priori categorical variable(s)

 

 

 

 

 

 

 

Ordination method

 

 

 

 

 

 

 

Question 2:

Construct, by hand, a simple nearest-neighbour dendrogram from the distance matrix below. Do not produce the dendrogram in R. Use the distances to ‘sketch' the relationships

1

2

3              4

2   1.912370

 

 

3   5.382450

7.120542

 

4   3.385996

5.059430

2.138709

5   1.512238

3.190303

4.575420   2.910661

Question 3:

Calculated by hand the Euclidian distance between individuals 1 and 2 for variables X1 and X2. Show all working.

 

X1

X2

1

-0.46

-0.46

2

-1.41

-1.79

3

1.78

1.48

4

0.60

0.55

5

0.13

0.31

Question 4: What are some limitations or disadvantages of multivariate methods generally? (no more than 300 words)

Question 5: Explain your understanding of eigen vectors and eigen values (your answer must be in your own words and will be checked using a plagiarism checker) (no more than 300 words)

Question 6: Based on the Parallel Analysis table below, how many factors would you interpret? Explain you answer.

Factor

Actual eigen value

95th percentile

1

2.45

1.99

2

1.98

1.89

3

1.13

1.14

4

1.02

1.08

5

0.89

1.03

Attachment:- Multivariate Analysis for High-Dimensional Data.rar

Reference no: EM132314017

Questions Cloud

Report on key aspects of project risk management : MBA643 - Project risk, finance, and monitoring - kaplan business school - develop a report on key aspects of project risk management and how they might
Self as mental representation : Analyzing types of self-schema described in text. Be sure to discuss metacognitive aspects of self-knowledge (the analysis and evaluation of one's self-concept)
What issues were addressed from counseling perspective : Determine what case management aspects were utilized in the case. Determine what issues were addressed from a counseling perspective.
Review of the literature with a focus on care intervention : NURS2119-Dual Diagnosis and Community Work-RMIT Australia-Provide background section which might consider historical background to the development of the role.
Explain your understanding of eigen vectors and eigen values : STA8005 - Multivariate Analysis for High-Dimensional Data - University of Southern Queensland - hat the relationships are between the combination of river size
How has taking sociological perspective changed : How has taking sociological perspective changed way you view our social environment and/or society? how has sociological imagination changed your view of things
Concepts of social change-leadership and advocacy : Review this week's Learning Resources. Think about similarities and differences among the concepts of social change, leadership, and advocacy.
Economic systems in the current world from socio cultural : Compose descriptive essay, of maximum 500 words, APA style, in which compares life forms and economic systems in current world from a socio cultural approach
Treatment of forensic populations and professional goals : Think about the concepts, strategies, and treatment approaches related to forensic populations that you read about this week.

Reviews

len2314017

5/30/2019 2:31:21 AM

• Please read this document fully and carefully. • You should complete lecture and tutorial examples before completing this project. • There are two parts to this assessment. • Part A is the project analysis of data as described in detail on pages 2-4 of this document. Your submission will be no more than 6-9 pages for Part A. • Part B is the completion of a set of 4 questions on page 5-6 of this document. • Part A and B should be submitted together in one pdf file to the link provided on the StudyDesk. • Additionally, you will submit one R script file for your work in Part A. • Your pdf file and your R file should be named “your name STA8005 project.pdf” and “your name STA8005 project.R” respectively. You will lose presentation marks if you do not follow this naming structure.

Write a Review

Basic Statistics Questions & Answers

  Find mean and variance of number of ink jet printers

If a group of 10 printers is chosen at random from the store, find the mean and variance of the number of ink jet printers.

  Each individual letter of the word tennessee is placed on a

each individual letter of the word tennessee is placed on a piece of paper and all 9 pieces of paper are placed in a

  What is the standard error of the mean decrease

Suppose that 100 people are given lipid lowering drugs and that the drug appears to be effective in about 20 of them. Effective is defined as lowering their cholesterol values by 10 points in one month.

  How do you suggest the questions should be ordered

Field tests of this activity have shown that students are most dissatisfied with fa, cilities and most satisfied with the academics at their institution.

  Compute the minimum common sample size needed

A member of the state board of education wants to compare the proportions of National Board Certified (NBC) teachers in private high schools and in public high.

  Describe the dispersion of european settlers

1. Luis developed which theory to describe the dispersion of European settlers in the Northern Mexico during the sixteenth century based on following river valleys?

  What is the probability that a person selected at random

How many different combinations of a 3-member debating team can be formed from a group of 16 qualified students?

  Give the numerical value of the standard error of the mean

A randomly selected sample of n = 60 individuals over 65 years old takes a test of memorization skills. The sample mean is x‾= 53, and the standard deviation is σ = 7.2. Give the numerical value of the standard error of the mean.

  Domestic goats follow gaze direction

Domestic Goats Follow Gaze Direction and Use Social Cues in An Object Choice Task," published online in Animal Behavior in January 2005

  Find probability that one of tires will last more than miles

The lifetime of automobile tires of a certain brand are found to follow an exponential distribution with mean 30 (in thousands of miles). Find the probability that one of these tires will last

  Approximate probability that a randomly selected

Suppose  you want to find the approximate probability that a randomly selected family form Los ANGELES eans  more than $75,000 a year?  How would you find this probability? Explain

  Suppose iq scores were obtained from randomly selected

suppose iq scores were obtained from randomly selected siblings. for 20 such pairs of people the linear correlation of

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd