Create a coded numeric version of the regime variable

Assignment Help Applied Statistics
Reference no: EM132695253

STAT6001 Data Wrangling and Visualisation - University of Newcastle

Section A - BigFive Personality test

The file ‘A2A BigFive' contains over 1 million responses to an online personality test hosted by OpenPsychometrics. The dataset only contains responses from participants who consented to their data being used for research. It is provided in two formats: a Stata dataset file or a tab-delimited text file. You may import either file.
According to a particular model of personality there are five domains: Extraversion, Emotional stability, Agreeableness, Conscientiousness, and Openness. Each domain consists of 10 items (questions) scored on a Likert-type scale:
1 - Disagree
2 - Slightly disagree 3 - Neutral
4 - Slightly agree 5 - Agree

The score for each domain is calculated as the sum of the 10 items, with ‘negative' questions reverse scored. The Extraversion scale is shown below with the reverse-scored items in red, e.g., if a person answered 2 for EXT2, it contributes 4 points to the sum.
EXT1 I am the life of the party. EXT2 I don't talk a lot.
EXT3 I feel comfortable around people. EXT4 I keep in the background.
EXT5 I start conversations. EXT6 I have little to say.
EXT7 I talk to a lot of different people at parties. EXT8 I don't like to draw attention to myself.
EXT9 I don't mind being the center of attention. EXT10 I am quiet around strangers.

Question 1
a) Generate a frequency table of EXT1 to check for valid responses. How are missing data indicated in this dataset?

b) Are there any duplicate rows in this dataset? Are there any rows that should be removed?

c) Generate numeric versions of the variables EXT1 to EXT10. Make sure the negative items (2, 4, 6, 8, 10) are reverse scored and that items with value 0 are set to missing.

Calculate the domain score for Extraversion as the sum of all non-missing items. Show your code.

d) Now calculate two alternate versions of Extraversion:
1) the sum of all items, provided none of the items are missing; and
2) the sum of all items, with missings imputed as 3's.
Show descriptive statistics for the three different versions of Extraversion and comment on any differences in the mean, range, and number of missings.

e) The following algorithm is provided by OpenPsychometrics for scoring the domains:

Are there any problems with this method? Use test data provided in the table below to compare this method versus our previous calculations from parts (c) and (d) for the Extraversion domain.

f) Calculate the other four domain scores (for complete item scales). Create a plot showing the normal density curves for all five domains. Make sure the legend identifies the domain labels.

g) Calculate the pairwise correlations for the five domain scores from part (f). Comment on the magnitude and sign of the correlations.

h) Create a scatterplot of emotional stability score vs extraversion score using 1000 randomly selected observations from the data set. In addition, create a histogram of emotional stability filtered to rows with extraversion=30 and a histogram of extraversion filtered to rows with emotional stability=30.

Comment on the shape of these histograms and the joint distribution of extraversion and emotional stability.
Hint: use these graphs:
• A scatter plot with a random selection of 1000 rows from the dataset
• A histogram of emotional stability filtered to rows with extraversion=30
• A histogram of extraversion filtered to rows with emotional stability=30

Section B - Human Freedom Index

In this section you will explore, visualise, and interpret global data on democracy and human freedom using two measurement scales: the Human Freedom Index, designed by the Cato Institute which is a policy research organisation based in the United States, and the Democracy Index, designed by the analysis division of The Economist Group. The datasets for this question are:
• Human Freedom Index 2008-2017 (Source: Cato Institute)

"A2B human-freedom-index-2019.csv"
• Democracy Index 2019 (The Economist via wikipedia)

‘A2B Democracy index.csv'
• Global population data 1960-2019 (Source: WorldBank)

"A2B WorldBank population data.csv"

Human Freedom Index (Cato Institute)

There are 33 measures of personal freedom and 43 measures of economic freedom. Each is scored on a scale of 0 to 10 with higher scores indicating more freedom. There are two subindices: personal freedom and economic freedom which are averages of related items. The overall Human Freedom Index score is an average of the two subindices personal freedom (pf_score) and economic freedom (ef_score). This dataset contains annual data for the years 2008-2017 for up to 162 countries in long format. Countries are identified by country name and 3-digit ISO code.

Democracy Index (The Economist)

There are four measures of civil and political freedom and an overall measure of democracy called the Democratic Index (score). The score is categorised into four regime types (regime). This dataset contains 2019 data for 167 countries. Countries are identified by country name.
Population

The third dataset contains annual global population data for the years 1960-2019 sourced from the WorldBank. Data is available for up to 264 countries or regions. Countries are identified by country name and 3-digit ISO code.

Question 1
a) Create a new variable for the overall human freedom index (HFI), which is calculated as the average of the scores for personal freedom (pf_score) and economic freedom (ef_score). Label and format all three variables. Show your code.

b) Calculate all pairwise correlations for the variables pf_score, ef_score, and HFI (filter the dataset to the most recent year). Comment on the correlation between HFI and the other two scores.

c) Create an appropriate plot to show the association between personal freedom and economic freedom (restricted to the most recent year). Describe the relationship.

d) Calculate the five-number summary for the annual global HFI. Also include the ‘nmiss' statistic. What does ‘nmiss' represent?

e) Create a line plot showing the global average HFI over time. Comment on whether there is a trend in the data. If there is a trend, what might explain it?

f) Create a line plot showing the global average over time as a series for each of the three scores (pf_score, ef_score, HFI). Only include countries that have data for 2008.

Question 2
a) Calculate a variable that measures the change in HFI from 2008 to 2017. Create a list of the Top 5 and Bottom 5 countries that have had the most and least change in HFI. Show their HFI scores for 2008 and 2017 and the change.

b) Collapse the region variable into fewer categories using the table below. The new variable should be a coded numeric variable with a custom format.

Asia and Oceania Caucasus & Central Asia, East Asia, Oceania, South Asia
Europe Eastern Europe, Western Europe
The Americas Latin America & the Caribbean, North America
Africa/Middle East Middle East & North Africa, SubSaharan Africa

c) Compare the distribution of change in HFI for the four regions with an appropriate graph and summary statistics.

Question 3

a) Perform a merge with base dataset filtered to HFI data for the year 2017. The second dataset is the democracy index .csv and the match variable is the string variable for country name. Don't forget to check for non-matches and document any necessary cleaning.

Show a frequency table of regime for the countries that have HFI data.

b) Create a coded numeric version of the regime variable with a custom format in the following order: Full democracy, Flawed democracy, Hybrid regime, Authoritarian.

Create a boxplot showing the distribution of HFI by regime.

c) Create a contingency table of region by regime. What proportion of countries in each region are Full democracies?

d) Merge the 2017 population data from the WorldBank population dataset into the combined HFI/democracy index dataset. What proportion of the world's population live in Full democracies?

e) Create a bubble scatterplot of Democracy Index (score) versus 2017 Human Freedom Index (hfi) with:
• Bubble size determined by 2017 population
• Bubble colour determined by region (either of the four-level or ten-level versions)
• Bubble labels for a limited selection of countries including (at least): Venezuela, Syria, Egypt, India, China, Hong Kong, Norway, New Zealand. Depending on how many countries you decide to label, use either the ISO_code or the country name.
What can we interpret about levels of democracy and human freedom from the relative positioning of New Zealand and Hong Kong?

f) One theory is that more populous countries have fewer freedoms than less populous countries because of the increased difficulty in governing a larger population. Create a scatterplot of Human Freedom Index (hfi) versus the logarithm of population, labelling the same countries as you did in part e).

Comment on the relationship between population and Human Freedom Index. Does the data support this theory?

Attachment:- Data Wrangling and Visualisation.rar

Reference no: EM132695253

Questions Cloud

What is the purchase price of the investment : What is the purchase price of the investment? On January 1, 2019, Shimmer Company purchased a 6-year bond with a face amount of 5,500,000
Explain how apcs trigger an immune response : Explain how APCs trigger an immune response.
Explain what microbial antagonism is : Explain what Microbial Antagonism is and the part it plays in our immunity.
Do support statement for total net assets : Discuss,'Total net assets' figure in a balance sheet is often a poor indicator of the market value of a business. Do you support this statement and explain why?
Create a coded numeric version of the regime variable : Create a coded numeric version of the regime variable with a custom format in the following order: Full democracy, Flawed democracy, Hybrid regime, Authoritaria
Complete the remainder of the employee earnings records : Complete the remainder of the employee earnings records for the five employees from PSa 4-6. The earnings section of the employee earnings
What role does the forensic psychologist play : What is the role of the forensic psychologist in sentencing in capital cases? Juries typically must weigh the aggravating and mitigating circumstances.
Determine the penalty amount : Apply the percentage of the penalty to the total past due balance to determine the penalty amount. Format values as Currency, no decimals.
Determine victim assistance programs in the state : Determine victim/witness assistance programs in the state. Students can interview victims of crime known to them to determine if victims were informed of their.

Reviews

Write a Review

Applied Statistics Questions & Answers

  Question 1nbspnbsp a large shipping company recorded the

question 1nbspnbsp a large shipping company recorded the number of tons shipped weekly across the pacific for 50

  Find Differences Between Three or More Groups or Conditions

How to Find Differences Between Three or More Groups or Conditions, complete the Kruskal-Wallis (non-parametric) test

  A sample of price of 16 different models of mobile in a stor

1) A sample of price of 16 different models of mobile in a store are as follows: 900 300 340 450 280 220 340 290 370 400 310 340 430 270 380 910 a) Calculate the mean, median, mode, first and third quartile. b) Calculate the variance, standard deviat..

  What variables provide a significant unique contributions

Do the independent variables correlate statistically significantly and practicallywith the dependent variable and is collinearity between the independent variables a concern?

  Generate an array of ten random numbers

Generate an array of 10 random numbers between 0 and 1 using the rand function. Round-off the numbers in the array using the round function (The array should now have 0s and 1s). Display to the user in a fullsentence the number of 1s present in the a..

  The labor needed to produce product 1 is 4 hours

Decide how many of each of the following products your company should produce. The labor needed to produce product 1 is 4 hours, product 2 requires 9 hours of labor. Product 1 and product 2 both require 1 hour of inspection time before shipping. The ..

  Calculate summary stats of amount left

For both simulations, calculate summary stats of amount left (mean, median, variance min and max) and plot histograms of the amount left

  Data analysis and statistical modeling for business

Data Analysis and Statistical Modeling for Business

  Western clothing company produces denim jeans

1) Western Clothing Company produces denim jeans. Each month the company incurs a fixed cost of $10,000 and variable cost of $8 per pair of jeans. Demand for the jeans depends on the price (P) according to the following relationship: Demand = 1,500 -..

  What is the significance of hypothesis in research what are

what is the significance of hypothesis in research? what are type i and ii errors inhypothesis

  What are the tax consequences of the share certificate

Advise John Jones of the tax consequences of Items 1 – 6, above. You should discuss what amounts would be included in his assessable income or, if any item is not assessable income, why that is so. Your answer should include a discussion of the follo..

  Understanding of several quantitative techniques

Provide students with a basic understanding of several quantitative techniques that are used extensively for decision making in business.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd