Reference no: EM132373838 , Length: word count:1000
Predictive Analytics Individual Assignment - Text, Clustering & Estimation
After receiving your initial analysis of the Zomato data, the Bangalore Food Assist (BFA) asked you to find out:
A. What are the groups of similar restaurants based on their customer feedback? (text only)
B. What is the difference in reviews of restaurants serving different meal types? (text + structured)
And then, based on structured data and text, to develop a new estimation model assisting owners of those restaurants, which are already established but not as yet on the Zomato site, on this:
C. What would be the expected rating of their restaurant on the Zomato web site? (mixed attrs)
BFA have independently collected customer reviews and information on the best liked meals for such restaurants (deploy.csv).
A data set previously given to you (train.csv & test.csv) includes similar information about the restaurants on the Zomato site.
BFA insists that your model must have high precision of its estimates, which are to be measured in MAE, RMSE and correlation.
It is also important that your solution could be seamlessly deployed on the BFA system and be capable of assisting individuals and groups of customers.
Tasks and Deliverables -
Part LP3 -
Exec: Briefly define a business problem.
Clusters: Perform cluster and segmentation analysis of review texts, using predominantly text data. Describe emerging relationships in data.
Anomalies: Identify and deal with anomalies.
Visualise clusters and anomalies using PCA and interpret the results. Answer question (A) and optionally (B).
Part LP4 -
Exec: Create a problem definition and write a brief spec of its possible solution.
Model: Create these estimation models, i.e. (M1) random forests or GBTs, (M2) regression and (M3) neural nets. Ensure your solution includes a mix of structured and text data. Describe operators properties. Optionally use custom ensembles (self-study).
Validate & Optimise: Optimise the models' performance to minimise overall error. Cross-validate and compare all models (including ensembles), using MAE, RMSE and correlation.
Visualise optimisation results. Optionally use grid optimisation.
Solution: Answer question (C) and justify your answer. Create a deployment process. Apply this process to new data (deploy.csv).
Optionally demonstrate that your deployed process is capable of handling a single enquiry (data consisting of 1 restaurant).
Extend: Conduct research and use novel data mining approaches.
Need the solution as .rmp file and the report as given in the attached template.
Report- 1000
Executive summary (one page)
Create a Model(s) in RapidMiner (two pages / page 1)
Evaluate and Improve the Model(s) in RapidMiner (two pages / page 1)
Provide an Integrated Solution in RapidMiner (one page)
Further Research and Extensions in RM (one page).
Attachment:- Predictive Analytics Assignment Files.rar