Reference no: EM132788019
Business/Finance Data Analysis Assignment
As in Assignment 1, this assessment item requires you to perform some data analysis in Excel, and write up a short report presenting and interpreting your results. Please take this word document and insert your answers to each question below the relevant dot point. Again, there will be marks allocated to both the technical side (i.e. getting the correct answers) and the quality of the interpretations, with an emphasis on the exposition.
Q1 - Serology Tests for Covid
Mapping the rates of infection of Covid-19 is a critical task for public health professionals. Geographical locations that have had high fractions of their populations infected may be close to herd immunity, while places that have had very few infections may see surges in the future. Your task in this question is to perform some analysis using a simulated serological dataset on diagnosing exposure to Covid. The data are available in the file covid.xls.
Suppose you take a random sample of 381 individuals from a town in the USA, and you find that 88 individuals report positive antibody results. Calculate the sample proportion for positive antibodies, the standard error of this proportion, and provide a 90% interval for the true population proportion.
Calculations such as those performed above require normality assumptions that are only approximations. Is your assumption of normality appropriate in this instance? Why or why not?
Briefly discuss what happens to this interval if (i) the sample size were to increase, and (ii) if the confidence level was changed from 90% to 95%. Provide some intuition for your answers.
To model herd immunity, epidemiologists use the formula H=1-1/R_0. Here Hlies between zero and one and is the fraction of the population that needs to be immune, and R_0 is the base reproductive rate (the average number of transmissions per infection at the start of an epidemic. Suppose an epidemiologist produces a confidence interval for R_0 of 2.0±0.5. Calculate the corresponding interval for H. Is your town near the herd immunity threshold?
In order for your analysis to be generalizable to a larger population, it is important to ensure your sample resembles that population. Suppose that the average age of the town that you are analysing is known to be 43, the average income is $66,000 per year, and the average years of formal education is 14.2. How closely does your sample match these population parameters? Would you have any reservations about your sample not being appropriately representative? Discuss.
Formally test the hypothesis (using a t-test) that the average age is 43 using your data. Present the null and alternative hypotheses, the test statistic, a critical value and a conclusion. Briefly discuss whether or not this test indicates your data may be unrepresentative.
Q2 - Income Inequality in Australia
Economists are often interested in the distribution of income - the relative fractions of poor, middle-income and rich individuals. The file inequlity.xls has data on 1000 randomly drawn Australian incomes taken at 5-year intervals since 1980. You are to analyse these data to draw some inferences about the state of economic inequality in Australia, and the changes that have occurred over the last four decades.
The coefficient of variationCV=σ ^/x ¯is a useful metric for studying inequality as it is insensitive to units of measurement. Calculate the coefficient of variation for Australian incomes in each wave of the data, and plot these values on a line chart. What has been the trend in inequality since 1980?
Suppose that the poverty line is $15,000 per year (approximately $300 per week). Calculate the poverty headcount rate (the fraction of the sample that are below this line) in each wave of the data and produce an analogous line chart for poverty. Discuss the trend over the last four decades.Hint: use either the "sort" command or the "countif" command in Excel to calculate the number of poor people in each wave, and divide this by 1000.
Do your trends for inequality and poverty match? If not, what would explain the difference? Briefly discuss. Conceptually, what is the difference between inequality and poverty?
Calculate a 90% confidence interval for the average annual income for Australia in 1985 (hint: use t=1.645) and report your result. Suppose you know that the true average income in 2015 is $58,000. Does this value lie on the interval of credible values that you calculated for 1985? What can you conclude about income growth in Australia as a result of your calculation?
In some instances, data analytic techniques work better on distribution that are approximately normal. Produce a histogram for the distribution of income in 2015. Comment on the shape. Is it approximately normal? Generate a new variable in Excel equal to the log (ln) of income in 2015 and plot the distribution of this variable. Does the log transform make the variable appear closer to normally distributed?
Attachment:- Finance Data Analysis Assignment.rar