COSC 2670 Practical Data Science with Python Assignment

Assignment Help Python Programming
Reference no: EM132532528

COSC 2670 Practical Data Science with Python - RMIT University

General Requirements

This section contains information about the general requirements that your assignment must meet. Please read all requirements carefully before you start.

• You must do all modelling in IPython or Jupyter Notebook (in Anaconda).
You must include a plain text file called "readme.txt" with your submission. This file should include your name and student ID, and instructions for how to execute your submitted script files. This is important as automation is part of the 6th step of data science process, and will be assessed strictly.
• Parts of this assignment will include a written report, this must be in PDF format.
Please ensure that your submission follows the file naming rules specified in the tasks below. File names are case sensitive, i.e. if it is specified that the file name is gryphon, then that is exactly the file name you should submit; Gryphon, GRYPHON, griffin, and anything else but gryphon will be rejected.

Part 1: Retrieving and Preparing the Data
This assignment will focus on data modelling, and you can choose to focus on one ap- proach: Classification or Clustering.

For this assignment, you need to select one dataset from the following options, and then work on it:

1. Activity Recognition from Single Chest-Mounted Accelerometer Data Set. More details can be found from the following UCI webpage about this dataset

2. BLE RSSI Dataset for Indoor localization and Navigation Data Set. More details can be found from the following UCI webpage about this dataset

3. Mice Protein Expression Data Set. More details can be found from the following UCI webpage about this dataset

Being a careful data scientist, you know that it is vital to set the goal of the project, then thoroughly pre-process any available data (each attribute) before starting to analyse and model it. In your report in Part 4, You need to clearly state the goal of your project, and the design/steps of pre-processing your data. Please ensure you understand the data you selected, including the meaning of each attribute.

Part 2: Data Exploration

Explore the selected data, carrying out the following tasks:

Explore each column (or at least 10 columns if there are more than 10 columns), using appropriate descriptive statistics and graphs (if appropriate). For each ex- plored column, please think carefully and report in your report in Part 4): 1) the way you used to explore a column (e.g. the graph); 2) what you can observe from the way you used to explore it.

(Please format each graph carefully, and use it in your final report. You need to include appropriate labels on the x-axis and y-axis, a title, and a legend. The fonts should be sized for good readability. Components of the graphs should be coloured appropriately, if applicable.)

Explore the relationship between all pairs of attributes (or at least 10 pairs of at- tributes, if there are more in the data), and show the relationship in an appropriate graphs. You may choose which pairs of columns to focus on, but you need to gen- erate a visualisation graph for each pair of attributes. Each of the attribute pair should address a plausible hypothesis for the data concerned. In your report, for each plot (pair of attributes), state the hypothesis that you are investigating. Then, briefly discuss any interesting relationships (or lack of relationships) that you can observe from your visualisation.

Please note you do not need to put all the graphs in your report, and you only need to include the representative ones and/or those showing significant information.

Part 3: Data Modelling

Model the data by treating it as either a Classification or Clustering Task, depending on your choice.
You must use two different models (i.e. two Classification models, or two Clustering models), and when building each model, it must include the following steps:

• Select the appropriate features
• Select the appropriate model (e.g. DecisionTree for classification) from sklearn.
• If you choose to do a Classification Task,
- Train and evaluate the model appropriately.
- Train the model by selecting the appropriate values for each parameter in the model. You need to show how you choose this values, and justify why you choose it.
• If you choose to do a Clustering Task,
- Train the model by selecting appropriate values for each parameter in the model.
Show how do you choose this value, and justify why you choose it (for example, k in the k-means model).
- Determine the optimal number of clusters, and justify
- Evaluate the performance of the clustering model by:
∗ Checking the clustering results against the true observation labels Constructing a "confusion matrix" to analyse the meaning of each cluster by looking at the majority of observations in the cluster. (You can do this
by using a pen and a piece of paper, as we did in Practical Exercise; if you prefer, you can also explore how to do this step directly in IPython.)

After you have built two Classification models, or two Clustering models, on your data, the next step is to compare the models. You need to include the results of this comparison, including a recommendation of which model should be used, in your report (see next section).

Part 4: Report

Write your report and save it in a file called report.pdf, and it must be in PDF format, and must be at most 12 (in single column format) pages (including figures and references) with a font size between 10 and 12 points Penalties will apply if the report does not satisfy the requirement. Remember to clearly cite any sources (including books, research papers, course notes, etc.) that you referred to while designing aspects of your programs.
Your report must have the following structure:
• A cover page, including
- Statement of the solution representing your own work as required
- Title
- Author Information
- Affiliations
- Contact details
- Date of report
• Table of Content
• An abstract/executive summary
• Introduction
• Methodology
• Results
• Discussion
• Conclusion
• References
Please revisit the relevant slides in Week1 lecture if needed.

Part 5: Presentation

• The presentation should
- explain the goal of the project.
- briefly describe your chosen data set.
- describe the data preparation steps.
- state the hypotheses/questions that you were investigating.
- explain what the modelling steps are, and what the results are.
- show the final conclusion and recommendation.
• The presentation should be no more than 5 minutes.

• Your presentation slides should be:
- Microsoft PowerPoint slides (with audio inserted for each slide by using: Insert
- > Audio - > Record Audio).
- or you can create your own presentation slides (e.g. PDF version) and please submit your own recording of your presentation as well.

Attachment:- Practical Data Science with Python.rar

Reference no: EM132532528

Questions Cloud

What is the total prime cost : Cost Item Amount Direct labor $100,000 Direct material $75,000 Advertising $50,000. What is the total prime cost? What is the total conversion cost
Construct what is the present value of the investment : What is the present value of this investment if 5 percent per year is the appropriate discount rate? Round the answer to two decimal places.
Explain the doctrine of respondent superior : Explain the doctrine of respondent superior.
Identify any pertinent history or medical information : Describe (without violating HIPAA regulations) each client, and identify any pertinent history or medical information, including prescribed medications.
COSC 2670 Practical Data Science with Python Assignment : COSC 2670 Practical Data Science with Python Assignment Help and Solution, RMIT University - Assessment Writing Service - Retrieving and Preparing the Data
Implementing new interorganizational system : Human Resource Department of a medium-sized organization that is implementing a new interorganizational system that will impact employees,
Explain the history of abuse and trauma : Address in a comprehensive client assessment of the Hernandez family the following: Demographic information, Past psychiatric history.
Make appropriate general journal entries for consolidation : In relation to the intragroup transactions Make the appropriate general journal entries for the consolidation worksheet at 30 June 2020.
French and british colonization history : "Mauritian Law has various sources of law as a consequence of its French and British colonization history. What are the various sources of Mauritian law?"

Reviews

Write a Review

Python Programming Questions & Answers

  Calculate and show the sum and max of those numbers

Ask the user to enter X numbers into a list. Put those numbers into a list, and show the list. Calculate and show the sum, average, min, max of those numbers.

  Prompts the user to enter the falling time in seconds

You need a function that calculates the distance an object falls over time. Prompts the user to enter the falling time in seconds.

  What is one way interpreted programming languages differ

When writing a for loop in Python, programmers assign a numeric range to tell the loop where to start, how many times to repeat, and when to stop.

  Look up terms in a tech dictionary

Create a program that allows a user to look up terms in a tech dictionary - programming or scripting that is of interest to you, and complete one or more web-based tutorials on the topic.

  Create an an expression that produces a list of all values

Create an an expression that produces a list of all the values from 1 to n, which are not divisible by 2 or divisible by 3.

  Program that returns the result of adding the odd integers

Create a Python function called sumOfOdds which takes one argument, an integer greater thanor equal to 1, and returns the result of adding the odd integers.

  Write a program that prompts for the day and month of user

CPSC 1301L COMPUTER SCIENCE-Write a program that prompts for the day and month of the user's birthday and then prints a horoscope. Make up technology.

  ICT112 Creative Problem Solving with Programming Assignment

ICT112 Creative Problem Solving with Programming Assignment Help and Solution, University of the Sunshine Coast - Assessment Writing Service

  Prints a table showing the integers from 1 through 10

Prints a table showing the integers from 1 through 10 and their squares. Your output should be formatted neatly in columns as shown.

  Provide a pseudocode and a flowchart

Explain Provide a PowerPoint that contains a FLOWCHART and a PSEUDOCODE for each problem. Provide a pseudocode and a flowchart.

  Write a program that reads in words entered by the user

CSCI 1100 Computer Science Homework - Loops and Lists. Write a program that reads in words entered by the user and checks whether the word at least 8 characters

  Program for simulating a supermarket self-service checkout

ITECH1400 - Foundations of Programming - Design and model two classes: Product and CheckoutRegister and Create an activity chart which describes the behaviour

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd