Implement a recommender system on collaborative filtering

Assignment Help Computer Engineering

Reference no: EM132993099

ICT707 Data Science Practice - University of the Sunshine Coast

Part I - PySpark source code

Important Note:
- For code reproduction, your code must be self-contained. That is, it should not require other libraries besides PySpark environment we have used in the semester. The data files are packaged properly with your code file.

- The data sets used in the lecture slides should not be used as the data set of the assignment. This will result in 0 mark for the coding component.

In this component, we need to utilise Python 3 and PySpark to complete the following data analysis tasks:
1. Exploratory data analysis
2. Recommendation engine
3. Classification

You need to choose a dataset from Kaggle to complete these tasks.

Task I.1: Exploratory data analysis

This subtask requires you to explore your dataset by
• telling its number of rows and columns,
• doing the data cleaning (missing values or duplicated records) if necessary
• selecting 3 columns, and drawing 1 plot (e.g. bar chart, histogram, boxplot, etc.) for each to summarise it

Task I.2: Recommendation engine

This subtask requires you to implement a recommender system on Collaborative filtering with Alternative Least Squares Algorithm. You need to include
• Model training and predictions
• Model evaluation using MSE

Task I.3: Classification

This subtask requires you to implement a classification system with Logistic regression. You need to include
• Logistic Regression model training
• Model evaluation

Part II -Report

You are required to write a report with the following content:
• Provide a high-level survey on the advances of data science in the past 2 years.
• Compare the features of Spark version 2.4 that we used this semester and the new version 3.0.
• Explain your design and implementation of the machine learning parts in your code, including the following topics:
o Background of your selected data set
o For each task, which learning algorithm is used and what are its key parameters and how you set them up
o For each task, provide comments/evaluation for the model learnt

Your report should use the following template:

Table of Contents

1.0 Advancement of Data Science (500 words)

2.0 Comparison of Spark 2.4 and 3.0 (250 words)

3.0 Machine Learning Implementation (250 words)
3.1 Data set
3.2 Collaborative filtering
Features of the model, key parameters and configuration Evaluation
3.3 Logistic regression
Features of the model, key parameters and configuration Evaluation

References

Attachment:- Data Science Practice.rar

Reference no: EM132993099

Questions Cloud

Explain which costs are used for long-term pricing decisions : Which costs are used for long-term pricing decisions. Explain which costs are used for short-term pricing decisions. Why is different than long-term pricing

What is your effective annual interest rate : What is your effective annual interest rate (an opportunity cost) on the revolving credit arrangement if your fi rm does not use it during the year

Identify some problem areas in the cost of capital analysis : Identify some problem areas in the cost of capital analysis. Do these problems invalidate the cost of capital procedures

Explain the purpose of financial reporting procedures : Explain the purpose of a profit and loss statement and give two of its key features. Explain the purpose of financial reporting procedures

Implement a recommender system on collaborative filtering : Implement a recommender system on Collaborative filtering with Alternative Least Squares Algorithm - implement a classification system with Logistic regression

What should be response of the union to such demands : What should be response of the Union to such demands/ Can you suggest some alternative and fruitful solution to this demand?

Why has the opioid crisis become so severe : Why has the opioid crisis become so severe? What can it teach us about the US healthcare system?

Relationship between leadership and emotional intelligence : 1. Explain the author's description of the relationship between leadership and emotional intelligence (EI).

Comprising all the student profiles and academic records : Assume we have a data set comprising all the student profiles and academic records of all the students registered in Australian universities since 1850 (when Au

User Account

All Pages