Reference no: EM132380088
ICT707 Data Science Practice Assessment Task - Big Data Assignment, University of the Sunshine Coast, Australia
Goal: To demonstrate a comprehensive view of big data analysis in terms of definitions and concepts, techniques, and producing big-data solutions to business problems.
Product: Artefact-Technical and Scientific, and Written Piece.
Format: A computer program that uses big-data analysis techniques to solve a business problem, plus a report (1000 words) describing and justifying the design of that program.
This task is being used for measuring assurance of learning towards Association to Advance Collegiate Schools of Business (AACSB) accreditation. The following Program Learning Objectives will be assessed:
1: Problem Solving
Demonstrate critical and creative thinking to identify and solve complex business problems and arrive at innovative solutions.
Further details of this assessment will be given on Blackboard. This is an individual assessment.
Criteria:
- Presentation and organisation of report.
- Demonstrate critical analysis of the given problem and apply creative thinking and approaches to solve the problem.
- Application of relevant programming concepts.
- Accuracy of the program output.
- Adherence to the recommended programming styles.
Assignment Task -
This assignment consists of two deliverables, being:
- One code implementation - The code file in Jupyter Notebook format and the relevant data set files should be contained
- A report.
Part I - PySpark source code
Important Note: For code reproduction, your code must be self-contained. That is, it should not require other libraries besides PySpark environment we have used in the workshops. The data files are packaged properly with your code file.
In this component, we need to utilise Python 3 and PySpark to complete the following data analysis tasks:
1. Exploratory data analysis
2. Recommendation engine
3. Classification
4. Clustering
You need to choose a dataset from Kaggle to complete these tasks. Remember to include the data set file in you source code submission.
Note: In your notebook, please use Heading 1 Markdown cell to separate each sub task.
Task 1.1: Exploratory data analysis
This subtask requires you to explore your dataset by
- telling its number of rows and columns,
- doing the data cleaning (missing values or duplicated records) if necessary
- selecting 3 columns, and drawing 1 plot (e.g. bar chart, histogram, boxplot, etc.) for each to summarise it
Task 1.2: Recommendation engine
This subtask requires you to implement a recommender system on Collaborative filtering with Alternative Least Squares Algorithm. You need to include
- Model training and predictions
- Model evaluation using MSE
Task 1.3: Classification
This subtask requires you to implement a classification system with Logistic regression with LogisticRegressionWithLBFGS class. You need to include
- Logistic Regression model training
- Model evaluation
Task 1.4: Clustering
This subtask requires you to implement a clustering system with K-means. You need to include
- Model training
- Model evaluation
Part II - Report
You are required to write a report to explain your design and implementation of the machine learning parts in your code, including the following topics:
- Introduction/summary/explanation to the ML algorithm/concepts.
- The learning settings, such as how to prepare training/testing set, what are the key parameters and how you set them up.
- Comments/evaluation for the models learnt.
Your report should use the following template:
Table of Contents
1.0 Introduction
Explain the data set you've chosen, including its source URL. Demonstrate your exploratory data analysis in this section.
2.0 Machine learning implementation
2.1 Collaborative filtering
2.3 Logistic regression
2.4 K-Means
3.0 Conclusion
References
Your report should be about 1000 words, but no more than 1500 words. The report is to include (at least 5) appropriate references and these references should follow the Harvard method of referencing. Note that ALL references should be from journal articles, conference papers, technical papers or a recognized expert in the field.
Please follow the conventions detailed in: Summers, J. & Smith, B., 2014, Communication Skills Handbook, 4th Ed, Wiley, Australia.