Reference no: EM132483568
MIS772 - Predictive Analytics - Deakin University
Classification
AirbnbAI approached you to develop a RapidMiner process of determining if the New York City rental accommodation is viable economically, i.e. when getting more than one review per month:
• The number of reviews per month (n)
• Economic viability ≡ n > 1 (this is what we want to predict)
AirbnbAI provided you with a sample of 49,000 listings of rentals, 20% of which have not received any reviews as yet. The listings include the following information:
• Property id, name, type, price
• Id and name of the property host
• Property geo-location and its neighbourhood classification
• How many nights a dwelling is rented per year
• Minimum nights stay
• How many rooms are being rented in a building
• The number of occupants allowed in a rental
• Whether the listing is licensed or not
AirbnbAI would like you to use RapidMiner to generate some insights into the rental listings and these questions are of interests:
A) What neighbourhoods have the most attractive AirBnB rentals?
B) What kind of rentals attract the majority of reviews?
C) Which of the rentals that have no reviews are likely economically viable?
AirbnbAI wants you to use RapidMiner to cleanup and explore the provided data, then develop and evaluate a classifier to predict the rentals' long-term viability, and to minimise mis-classifications.
Partial Submission (Questions A, B - marked with the final submission) Exec Problem: Define your problem in business terms, in doing so answer questions A and B, cross-reference with other report sections for support.
Data Exploration: Visualise the selected attribute characteristics. Use the visualisations to support answering questions A and B.
Final Submission (Question C)
Exec Solution: Describe your solution in business terms, in doing so answer question C, cross-reference with other report sections for support. Data Preparation: Deal with duplicates, bad and missing values. Transform
the selected attributes or create the new ones as needed. Use appropriate
analysis and data visualisation to investigate relationships between attributes. Interpret results.
Model: Create and explain one or two classification models, i.e.
k-NN and Decision Tree, to address question C. Explain and justify your models' properties. Investigate and deal with the class imbalance.
Evaluation: Use hold-out or cross-validation of the model. Include a separate testing. Compare the performance of different models and select the best. Qualify how much we can trust the answer to question C.
Attachment:- Predictive Analytics.rar