Reference no: EM133009457
ITEC203 Introduction to Data Science and Machine Learning - Australian Catholic University
Assessment Artefact: Python Codes and Comments
Context
Suppose the students take a Data Engineer role in a company. One of their daily duties would be processing huge amount of data for different projects. So this assignment will guide the students how to handle these situations with an example.
Instructions
MNIST number dataset a set of 70,000 small images of digits handwritten by high school students and employees of the US Cen- sus Bureau. Each image is labeled with the digit it represents. This set has been studied so much that it is often called the "hello world" of Machine Learning: whenever people come up with a new classification algorithm they are curious to see how it will perform on MNIST, and anyone who learns Machine Learning tackles this dataset sooner or later.
Instructions to explore this dataset are:
1. Use Jupyter Notebook for interactive practice of Python and related Machine Learning packages.
a. For installing jupyter notebook, could install anaconda first, as Anaconda is the most widely used Python distribution for data science and comes pre-loaded with all the most popular libraries and tools.
b. Create virtual environment for each python project
c. For installing libraries,
d. For creating a Jupyter notebook,
e. Familiarize yourself with cells in jupyter notebook and practice mixing texts and python coding.
2. Always refer to textbook ‘hands-on machine learning with Scikit-Learn, Keras & TensorFlow‘ for coding help.
3. Specific tasks include
a. download dataset
b. explore the dataset and output information include
i. how many images
ii. how many features and the range of feature values (e.g., histogram of the data value)
iii. how many categories/labels (discrete or continuous type)
iv. visualize randomly selected samples within each category (feel the variance of the data)
v. visualize more data samples to see whether there are bad data samples need to be removed.
c. do more data manipulation
i. Explore PCA to reduce feature dimensions down to two dimensions and plot the result using Matplotlib. You can use a scatterplot using 10 different colours to represent each image's target class.
ii. Use t-SNE to reduce the MNIST dataset down to two dimensions and plot the result using Matplotlib with scatterplot.
iii. Summary/conclude your discovery and insights.
Structure
Prepare a jupyter notebook for this assignment. The structure of the Jupyter notebook should alternate texts and python codes and cover topics listed the in specific tasks above.
Attachment:- Data Science and Machine Learning.rar