Implement a recommender system on collaborative filtering

Assignment Help Other Subject
Reference no: EM132315443

Assignment Task

This assignment consists of two deliverables, being:

• PySpark source code in Jupyter Notebook format. All Jupyter notebook files and the date set file relating to this assignment should be contained within a folder named: Task 3- Your Name-Student Number, the folder is then to be zipped and uploaded to blackboard.

• A report. The report must be uploaded as a separate file.

Part I - PySpark source code

Important Note: For code reproduction, your code must be self-contained. That is, it should not require other libraries besides PySpark environment we have used in the workshops.

In this component, we need to utilise Python 3 and PySpark to complete the following data analysis tasks:
1. Exploratory data analysis
2. Recommendation engine
3. Classification
4. Clustering

You need to choose a dataset from Kaggle to complete these tasks. Remember to include the data set file in you source code submission.

Task I.1: Exploratory data analysis

This subtask requires you to explore your dataset by
• telling its number of rows and columns,
• doing the data cleaning (missing values or duplicated records) if necessary
• summarising 3 columns with plots (e.g. bar chart, histogram, boxplot, etc.)

Task I.2: Recommendation engine

This subtask requires you to implement a recommender system on Collaborative filtering with Alternative Least Squares Algorithm. You need to include
• Model training and predictions
• Model evaluation using MSE

Task I.3: Classification

This subtask requires you to implement a classification system on Logistic regression with LogisticRegressionWithLBFGS class. You need to include
• Logistic Regression model training
• Model evaluation

Task I.4: Clustering

This subtask requires you to implement a clustering system on K-means. You need to include
• Model training
• Model evaluation

Part II -Report

You are required to write a report explaining the theory underlining the key concepts around the design and implementation of your code. Finally, you are to include all code in .py format in the appendices of the report. Note that the code will not count towards the word count.

Your report should follow the following template:

Table of Contents

1.0 Introduction

Key System Concepts
Machine learning pipelines. Explain key steps in machine learning pipelines and how they were applied in your code.
Collaborative filtering. Explain Collaborative filtering principles and how they were applied in your code.
Logistic regression. Explain Logistic regression principles and how they were applied in your code.
K-Means. Explain K-Means principles and how they were applied in your code.

4.0 Conclusion References

Appendices

Report Format

Your report should be 1000 ~ 1500 words.

The report MUST be formatted using the following guidelines:
• Title Page - Must not contain headers, footers, or page numbering. Include your name as the report's author.
• Header - Report title
• Footer - your name and the page number
• Paragraph text - 12 point Calibri single line spacing
• Headings - Arial in an appropriate type size
• Margins - 2.5cm on all margins

• Page numbering
• Introduction and onwards to use conventional numerals (1, 2, 3, 4) starting at page 1 from the introduction.
• The report is to be created as a single Microsoft Word document (version 2007 or later). No other format is acceptable and doing so will result in the deduction of marks.

Attachment:- Data Science Practice.rar

Verified Expert

In this assignment I have done the data integration and html mashup for the data provided here as csv file so that we can do the data merging before data merging data cleaning is done which provides the clean data without the NA values so the json and xml file are converted first in csv file then it is merged and integrated.

Reference no: EM132315443

Questions Cloud

How has marketing evolved over time : Imagine for a moment that you're in charge of marketing smartphones to the rapidly growing market segment of very old consumers.
How would you do an vertical analysis of a balance sheet : How would you do an vertical analysis of a balance sheet? Explain the process and give an example using actual numbers.
How could a high-integrity accounting firm such as kpmg : How could a high-integrity accounting firm such as KPMG protect itself against a rogue partner tarnishing the reputation of the firm in the future?
What is the net present value of replacing the equipment : Maintenance work will be necessary on the new equipment in Year 3, costing $3,000. The current equipment will last for five more years;
Implement a recommender system on collaborative filtering : ICT707 - Data Science Practice - University of the Sunshine Coast - choose a dataset from Kaggle to complete these tasks. Remember to include the data set file
What is the adjusted book? balance : The petty cash fund at Brookshire Company has a designated balance of $350. The fund currently holds $128 in cash and $231 in petty cash tickets.
Idea of espoused reality versus actuality was discussed : The idea of espoused reality versus actuality was discussed. Explain how a manager would avoid having espoused reality enter into his or her department.
The browser safari prefers which search engine : TeachThought's 100 Search Engines for Academic Research includes. Cora is using a search engine to find organic results. What's she looking for?
Review your strategic communications plan : Review your Strategic Communications Plan, written communication, and verbal communication plan. What revisions will you make? Why?

Reviews

inf2315443

8/10/2019 3:16:17 AM

Many thanks to the team helped me in my need of the hour. Excellent work. Get full grades in their assignments. Must use service. Highly recommendable.

inf2315443

8/10/2019 3:13:03 AM

Make sure that assignment should be in payspark. The appendix is the main part of the assignment so include it carefully. You have to ensure that code is added in the appendix section. Provide good machine learning understanding in your report.

len2315443

6/3/2019 12:54:51 AM

Prior to submitting your code, you should ensure not only that it executes as required, but also looks professional. It is expected that you adhere to python standards for naming and indenting. All methods should be adequately documented such that another programmer examining your code will readily know what the code is doing.

len2315443

6/3/2019 12:54:41 AM

This assignment will take several weeks to complete and will require a good understanding of machine learning and PySpark for successful completion. It is imperative that students take heed of the following points in relation to doing this assignment: 1. Ensure that you clearly understand the requirements for the assignment – what must be done and what are the deliverables. 2. If you do not understand any of the assignment requirements – Please ASK your tutor. 3. Each time you work on any aspect of the assignment reread the assignment requirements to ensure that what is required is clearly understood. 4. We have practiced nearly all coding tasks in DataCamp before. If you have any difficulty, redoing the practices in DataCamp is recommended.

len2315443

6/3/2019 12:53:52 AM

Late submission will be penalised according to the policy in the course outline. Please note Saturday and Sunday are included in the count of days late. Requests for an extension to an assignment MUST be made to the course coordinator prior to the date of submission and requests made on the day of submission or after the submission date will only be considered in exceptional circumstances. Assignment submission extensions will only be made using the official University guidelines.

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd