What does an agglomeration schedule tell us in general

Assignment Help Applied Statistics

Reference no: EM131439769

Question 1: Cluster Analysis

The spss file "metropolitan areas.sav" contains a data set taken from "Cities - Life in the World's 100 largest metropolitan areas, Population Crisis Committee, Washington, 1990". The data includes information about the following variables:

Population = population in millions

Murders = no of murders per year per 100,000 people

Food = percentage of income spent on food

Pproom = average number of persons living in one room

Water = % of homes with access to water and electricity

Telephone = no of telephones per 100 people

School = % of children completing education to age 18 years

Infant death = infant deaths/100 live births

Noise = ambient noise level on scale 1 (quietest) to 10 (noisiest)

Traffic = traffic flow: average mph of traffic in rush hour

area = area code: 1 = USA, Canada, Europe, Japan, Australia

In order to reduce the complexity of the data, I have conducted a cluster analysis.

a) What does an agglomeration schedule tell us in general? Provide a brief hypothetical example (using the Metropolitan Areas case), outlining the circumstances in which we might be interested in interpreting the agglomeration schedule.

b) When performing the hierarchical cluster analysis, I decided to select a 4 cluster solution. Would you have chosen the same number of clusters? What are the criteria for making this decision?

c) Please briefly summarize the key findings from the K-Means cluster solution. Do you believe it is a good solution? How would you label the clusters? What could be done to try improving the cluster solution?

d) As you can see in the dialog box for the K-Means cluster analysis, I did not specify any initial cluster means before performing the analysis. Why does it normally make sense to predetermine these values? What kinds of cluster means would make sense here as an input to the K-means cluster model?

e) Imagine that we obtain data from additional cities that are not currently included in our data set. How can I assign these new observations to one of the clusters identified in our previous analysis?

Question 2: Logistic Regression

A study was done to examine the characteristics of MBA graduates from four top US business schools. From the study, a subset of 100 students was selected. The data sample includes information on each student's profile with respect to

1. Grade Point Average (GPA)

2. GMAT Score

3. College Major

a. Humanities/Social Science (binary: 1=yes, 0=no)

b. Maths/Engineering (binary: 1=yes, 0=no)

c. Business (binary: 1=yes, 0=no)

4. Gender (1=Male, 2=Female)

5. Work Experience (1=1 year, 2=2years,...,6=more than 6 years)

One of the business schools (variable name: School_B), which is located on the East Coast has analyzed the data in order to better understand the profile of their MBA students in comparison to students at other top schools. In particular, a logistic regression analysis was performed using a binary variable (attendance=1; non-attendance=0) to predict the probability that a student in the survey attended School_B (instead of one of the other three schools).

The following screenshots display the steps taken when performing the logistic regression analysis in SPSS. The SPSS output report can be found in a separate file called appendix 2.

a) Based on the SPSS output provided in Appendix 2, is this a good model for predicting whether MBA students in the sample attended School_B? Please justify your answer from a statistical point of view by assessing model fit and overall model significance.

b) According to the output report, the significance level of the Hosmer-Lemeshow test is p=0.713. What does this mean? Is this good or bad news?

c) What types of students does School B attract? What are the most important predictors for attendance of School B?

d) In the output report you can see that GPA is a significant predictor of attendance at School B. Moreover, the natural logarithm of the unstandardized slope coefficient for GPA is Exp(B)=22.794. What does this mean?

e) According to the classification plot at the end of the SPSS output report, does the model seem to be better at predicting "attendance" or "non-attendance" at School B? Would you say that 0.5 is a reasonable cut-off value as a classification threshold?

Assignment Files -

https://www.dropbox.com/s/szbkh90yj0f8kk6/Assignment%20Files.zip?dl=0

Reference no: EM131439769

Questions Cloud

Calculate a confidence interval in spss : You will calculate a confidence interval in SPSS. Choose an appropriate variable from Weeks 2 and 3 and calculate a confidence interval in SPSS. (use: SEXUAL ORIENTATION and Hours spent on math homework/studying in typical school week all found in..

Develop a monte carlo simulation model : PD Tax Service is a regional tax preparation firm that competes with such national chains as H&R Block. The company is considering expanding and needs a financial model to analyze the decision to open a new store.

Calculate the values added : The value of imported leather is 70% of the value of the shoes (or $70). The tariff on shoes is 20% while the tariff on leather is 15%. Calculate the Values Added and use them to derive the Effective Rate of Protection or ERP.

Discuss at least five pros and five cons of outsourcing : Y?ou are the COO of Rockwell Collins (a manufacturer of aviation electronics, including for the military) and are considering outsourcing a portion of your production (including to a French company and to a Chinese company). Discuss at least five (5)..

What does an agglomeration schedule tell us in general : What does an agglomeration schedule tell us in general? Provide a brief hypothetical example (using the Metropolitan Areas case), outlining the circumstances in which we might be interested in interpreting the agglomeration schedule

Does the policy of buying us treasury bonds : Does maintaining a quasi-peg to the US dollar have a cost for China? Does the policy of buying US Treasury bonds have a cost for China's economy?

Calculate the optimum point : MIS20010 Business Analytics Calculate an investment risk figure for each company. We will use the Coefficient of Variation (CV) as the risk measure and

Which you will instruct the team to use to investigate : You are a public health scientist and informatician assigned to lead a team of environmental specialists investigating an outbreak of lead poisoning in a small community. The suspicion is that the lead poisoning has been caused by pollution genera..

Consumer decision making : Consumer decision making. Assume you're in the market for both a new cell phone and cell phone provider. Prepare a report summarizing your experience. Compare and contrast your experience with an actual consumer purchase decision your recently made.

Reviews

len1439769

3/25/2017 2:06:31 AM

I need help with the attached homework .most important you need to follow the instructions specifically. Please read the instructions below carefully: The exam paper consists of 2 sections; each of which needs to be answered. For your answers, please use the space provided. All questions are equally weighted and must be answered. Please make explicit any assumptions underlying your answers, interpret your results and justify your answers, conclusions and recommendations.

Write a Review

Required(*) Message

User Account

All Pages