Reference no: EM132360680
Statistics and Data Analysis Assignment - Excel Report
OVERVIEW OF THE ASSIGNMENT - This assignment will test your skill to collect, summarise and present data using Microsoft Excel and/or other approved tools. It will also test your understanding to interpret the output produced by the software to solve business problems.
You will need to use the dataset provided as well as collecting your own dataset and produce a numerical and graphical summary. You will need submit an Excel file following the requirement as explained below.
TASK DESCRIPTION - There are two datasets involved in this assignment: Dataset 1 and Dataset 2, detailed below.
Dataset 1: You will receive an email that contains a dataset that is specifically allocated to you. This dataset is edited from Google Play Store Apps dataset provided by Lavanya Gupta that can be obtained from Kaggle under the Creative Commons Attribution 3.0 Unported License. The number of cases from the original dataset has been reduced and all NaN values have been removed.
Dataset 2: You will need to collect a dataset via survey to answer the question given in Section 6 below. You will need to collect data from international students, between 3 - 4 different country of origin with at least 5 students per country.
Both datasets should be saved in an Excel file (see Submission Requirement on the next page). All data processing should be performed in Excel or Statkey. Specific instruction as to which tools should be used for each section will be given during tutorials.
Your tasks are to provide a description for each dataset in Section 1, and to answer the following research questions given in Section 2 to Section 6 using dataset 1 or dataset 2 as indicated in each section.
Section 1: Description about Data
a. Dataset 1: Give a short but clear description about this dataset. Is this primary or secondary data? What are the cases? What are the variables and their types?
b. Dataset 2: Explain how you collect the data and discuss its limitation (e.g. whether your sample is biased). Is this primary or secondary data? What are the variables and their types?
Section 2: Are most google play apps free?
Using Dataset 1, describe the proportion of phone apps which are free. You need to provide both numerical summary as well as graphical display that easily shows the proportion of the free apps.
Section 3: What is the price distribution of paid apps after an iteration of outlier removal? Using Dataset 1, perform one iteration of outlier detection on the price of paid apps using the method described in the lecture notes. After removing those outliers, describe the price distribution of paid apps using both numerical and graphical summary which shows the remaining outliers, if any.
Section 4: Is there a difference in prices among paid apps from the categories Communication, Games, and Tools?
Using Dataset 1, describe the distribution of paid apps from the categories Communication, Games and Tools. You need to provide both numerical summary as well as graphical display which shows the outliers, if any.
Section 5: Is there any relationship between Rating and Review?
Using Dataset 1, describe the relationship between the rating of an app and the number of reviews it receives. You need to provide both numerical summary as well as graphical display.
Section 6: Do international students from different countries tend to use different communication apps?
Using Dataset 2, describe the relationship between a student's country of origin and the main communication app the student is using (e.g. WhatsApp, Fb Messenger, WeChat, LINE, Viber, etc). You need to provide both numerical summary and graphical display.
Attachment:- Statistics and Data Analysis Assignment File.rar