Reference no: EM133742300
Data Acquisition and Management
Assessment - Sampling and data mining project
Learning outcome 1: Create analysis-ready data sets by applying and exploring basic validation, preprocessing, filtering and cleaning techniques
Learning outcome 2: Evaluate and apply data mining software
Assessment Description
Business Problem: Airbnb is a U.S. company which provides an online marketplace for short- term and/or holiday accommodation. Airbnb collect large volumes of data to gain insight into their clients and associated customers, such as review scores, host acceptance rate, ‘superhosts', popular accommodation types and density of listings in particular location.
Data sets: We have obtained data on Airbnb listings in Melbourne with a variety of variables. Sampled datasets, the original data and data dictionary will be available from Week 4. See sections below.
Assessment Instructions
Analysis and Report
Use Microsoft Excel or Power BI or Tableau.
Recall the sampling methods below that you have learnt about in lectures.
A data dictionary file and the following datasets (as .csv files) that contain sample data generated using quota, systematic, simple random, and stratified sampling will be available from week 4, see section c. below. You will also have to access the original population dataset cleansed_listings_dec_18.csv from the source, see section a. and section e. below.
Create a report and include your response to the following questions:
Access the data file cleansed_listings_dec_18.csv, by going to the link provided on MyKBS under the Assessment 1 tab. You will initially be downloading a zip folder from the Melbourne Airbnb Open Data project on Kaggle. Extract all the files within the folder and then choose the file cleansed_listings_dec_18.csv. Browse over the columns and comment on which variables appear to be the most useful in terms of insights into current listings. Document that in your report. (150 words)
List an advantage, possible disadvantage and limitations of each of the sampling methods. (150 words)
Access the sampled data sets on MyKBS. Choose a number of different variables, as in part (a), then for each of the sampled datasets create summary statistics for each of those variables. That is, make sure that the selected variables are the same for each of the four datasets and document them in your report. (300 words)
Interpret and compare the results of the summary stats across all four sample datasets. What conclusions can you draw from the comparison. Document your findings in your report. (500 words)
Repeat the above for the original dataset cleansed_listings_dec_18.csv. Explain with statistical examples which sampling method summary stats (across all chosen variables) were nearest in value to the original dataset summary stats.
Explain the variations in your report and include the supporting data. Explain possible ethical issues that could occur from the use of sampled data.
Briefly evaluate the software that you have used to produce the summaries. (500 words)