Reference no: EM132595192
Advanced Business Data Analysis
In this assignment you will use statistical tests for non-normal data. You may use methods (non-parametric statistics tests) and tools (R, Excel, or SPSS) of your own choice - please don't rely on one tool or method, variety is expected. It is not necessary to replicate any test you carry out, ie if you perform a test in R it is not necessary to repeat in SPSS and/or Excel. A data file (from the 2016 Census of Ireland) is suggested, though students are permitted to choose a different file if they wish (subject to approval by Dr O'Loughlin). Your task is to prepare a statistical report based on the data in the file.
The Central Statistics Office provides data on "Small Area Population Statistics" from the 2016 census of Ireland
For this assignment you will need two CSV files:
1. Small Areas (18,641)
2. Small areas OSI Boundaries
The first file contains raw data based on the 2016 Census of Ireland. The second file contains information such as location names and IDs. You should be able to combine both data sets into one using the GUID field. The Glossary file at the above site will also be useful.
The Small Areas CSV file has 18,641 records based on 68 columns of data. You are not expected to use all the data in the file and you may reduce to eliminate unused data if you wish. As there are a lot of data in this file, please be careful on what you decide to report on - it is up to you to choose.
Some suggested reports:
• a comparison of methods of transport to work by County/Planning Region
• difference between different methods of transport in urban vs rural areas
• a comparison of journey times to work by County/Planning Region
• a comparison of time leaving home to travel to work by County/Planning Region
• Correlations may also be tested
Suggested statistical tests:
• Descriptive statistics for all data used
• Tests for normality such Q-Q plots, Kolmogorov-Smirnov (please note - the Shapiro-Wilk test does not work for sample sizes over 5,000)
• Mann-Whitney U Test/Wilcoxon Rank Test to compare two samples (eg - travel times for Kerry vs Cork)
• Kruskal-Wallis H Test to compare three or more samples
• Post-hoc tests where appropriate
Suggested visual representation of data
• Q-Q/P-P plots
• Residuals
• Box plots
• Frequency Distributions/Histograms
• Scatter plots
Be aware that this is a statistical report and that Null/Alternate hypotheses, justification of levels of significance, correct reporting of results, and explanations of results are expected (see 8 Simple Rules document in Moodle). Please also explain and justify any statistical test used. State clearly any assumptions made.
Word count should not be less than 2,000 or more than 2,500 words. This does not include: data, code, tables, diagrams/charts, bibliography, tables of content, quotations, or appendices. Please indicate on your cover page the word count. Submit only one report document (Word or PDF) - support files such as R code, SPSS outputs, Excel files, are not required.
Attachment:- Advanced Business Data Analysis.rar