Develop and implement appropriate steps in IPython

Assignment Help Other Subject
Reference no: EM132327790

Practical Data Science Assignment - Data Modelling and Presentation

Introduction - This assignment focuses on data modelling, a core step in the data science process. You will need to develop and implement appropriate steps, in IPython, to complete the corresponding tasks.

This assignment is intended to give you practical experience with the typical 5th and 6th steps of the data science process: data modelling, and presentation and automation.

Task 1: Data Retrieving

This assignment will focus on data modelling, and you can choose to focus on one approach: Classification or Clustering.

For this assignment, you need to select one suitable dataset, from the following options:

1. Find and then analyse your own data set, in a domain that is of interest to you. If you choose this option, you will need to:

  • include a detailed description of the data in your report in Task 4, and describe each attribute of it, including the type, the range of possible values, whether it contains any missing values/errors
  • submit a copy of the dataset, to allow the assessment of your modelling result.

2. Select one data set from the UCI Repository. Choose one dataset from either the Classification or Clustering task.

Being a careful data scientist, you know that it is vital to set the goal of the project, then thoroughly pre-process any available data (each attribute) before starting to analyse and model it. In your report in Task 4, You need to clearly state the goal of your project, and the design/steps of pre-processing your data.

Please ensure you understand the data you selected, including the meaning of each attribute. For datasets from the UCI repository, you can obtain this information from the corresponding Web page under the sections Data Set Information and Attribute Information.

Task 2: Data Exploration

Explore the selected data, carrying out the following tasks:

Explore each column (or at least 10 columns if there are more than 10 columns), using appropriate descriptive statistics and graphs (if appropriate), e.g. the distribution of a numerical attribute, the proportion of each value of a categorical attribute. For each explored column, please think carefully and report in your report in Task 4):

1) The way you used to explore a column (e.g. the graph); 2) what you can observe from the way you used to explore it.

(Please format each graph carefully, and use it in your final report. You need to include appropriate labels on the x-axis and y-axis, a title, and a legend. The fonts should be sized for good readability. Components of the graphs should be coloured appropriately, if applicable.)

Explore the relationship between all pairs of attributes (or at least 10 pairs of attributes, if there are more in the data), and show their relationship in an appropriate graph. You may choose which pairs of columns to focus on, but you need to generate a visualisation graph for each pair of attributes. Each of the attribute pair should address a plausible hypothesis for the data concerned. In your report, for each plot (pair of attributes), state the hypothesis that you are investigating. Then, briefly discuss any interesting relationships (or lack of relationships) that you can observe from your visualisation.

Task 3: Data Modelling

Model the data by treating it as either a Classification or Clustering Task, depending on which dataset you previously selected.

You must choose two models within the particular Task category (i.e. two Classification models, or two Clustering models), and carry out the following steps for each model:

Select the appropriate model (e.g. DecisionTree for classification) from sklearn.

If you choose to do a Classification Task,

Split the data into training set and the test set. Specifically, please split the data at the following ratio:

  • 50% for training and 50% for testing;
  • 60% for training and 40% for testing;
  • 80% for training and 20% for testing;

 For each of the training/testing split, perform the following steps:

Train the model by selecting appropriate values for each parameter in the model.

  • You need to show how do you choose this value, and justify why you choose it (for example, k in the KNearestNeighbor model).

Test the accuracy of the model on the test set, and report the performance of the model in the following terms:

  • Confusion Matrix
  • Classification Error Rate
  • Precision
  • Recall
  • F1-Score

If you choose to do a Clustering Task,

Train the model by selecting appropriate values for each parameter in the model.

  • Show how do you choose this value, and justify why you choose it (for example, k in the k-means model).

Determine the optimal number of clusters.

Evaluate the performance of the clustering model by:

  • Checking the clustering results against the true observation labels
  • Constructing a \confusion matrix" to analyse the meaning of each cluster by looking at the majority of observations in the cluster. (You can do this by using a pen and a piece of paper, as we did in Practical Exercise 3 in Tute/Lab 06 (week7); if you prefer, you can also explore how to do this step directly in IPython.)

After you have built two Classification models, or two Clustering models, on your data, the next step is to compare the models. You need to include the results of this comparison, including a recommendation of which model should be used, in your report (see next section).

Task 4: Report

Write your report and save it in a file called report.pdf, and it must be in PDF format, and must be at most 12 (in single column format) pages (including figures and references) with a font size between 10 and 12 points Penalties will apply if the report does not satisfy the requirement. Remember to clearly cite any sources (including books, research papers, course notes, etc.) that you referred to while designing aspects of your programs.

Your report must have the following structure:

A cover page, including

  • Title
  • Author (your name(s))
  • Affiliations
  • Contact details
  • Date of report

Table of Content

An abstract/executive summary

Introduction

Methodology

Results

Discussion

Conclusion

References

Task 5: Presentation

You will be required to do a presentation for your assignment 2 in Week 12's Tute/Prac:

The presentation should

  • the goal of the project.
  • briefly describe your chosen data set.
  • the data preparation steps.
  • state the hypotheses/questions that you were investigating,
  • then explain what the analysis and results were.
  • the final conclusion and recommendation.

The presentations are a maximum of 3 minutes per group, and we suggest each group to have at most 3 slides, and print them out on a4 paper, to put on the document camera for presentation (to save time connecting computers between presentations).

If you have your teammates are in different Tute/Prac sessions, you can choose to attend one of the sessions and do the presentation together. But, if you prefer to do the presentation separately in each of your sessions, which is also acceptable.

Note - This is practical data science assignment. Need presentation of 3 slides and explanation of project in 1 page word file. In this practical data science assignment there are two options to choose modelling the data. Please choose CLASSIFICATION.

Presentation includes:- The presentation should briefly describe your chosen data set, state the hypotheses/questions that you were investigating, then explain what the analysis and results were.

Attachment:- Assignment File.rar

Reference no: EM132327790

Questions Cloud

Function of each of the eukaryotic organelles : Eukaryotic cells are more structurally advanced than prokaryotic cells. Describe the structure and function of each of the eukaryotic organelles.
Describe the specific mechanisms of enzyme function : The Importance of ATP and Enzymes. Enzymes are protein materials that control chemical processes. Describe the specific mechanisms of enzyme function.
Discuss the process as relates to toxic substance : Explain in a 3-5 page essay: “biotransformation” and discuss the process as relates to a toxic substance
Describe complexity of health industry in terms of workforce : How would you describe the complexity of the health industry in terms of workforce, environment, and social expectations?
Develop and implement appropriate steps in IPython : COSC2670 Practical Data Science Assignment - Data Modelling and Presentation, RMIT University, Australia. Develop and implement appropriate steps in IPython
The primary beliefs held by major religious traditions : Analyze the similarities and differences in the primary beliefs held by major religious traditions and the cultures in which these religions evolved.
The raw food diet make about not cooking foods : What claims does the raw food diet make about not cooking foods because they destroy enzymes? · What is the raw food diet? Full explanation
Photosynthesis and cellular respiration : Cellular respiration and photosynthesis form a critical cycle of energy and matter that supports the continued existence of life on earth.
Contrast the processes of diffusion-facilitated transport : Compare and contrast the processes of diffusion, facilitated transport, osmosis, and active transport of molecules across a cell membrane.

Reviews

len2327790

6/25/2019 2:42:14 AM

Please choose CLASSIFICATION. This is practical data science assignment. Need presentation of 3 slides and explanation of project in 1 page word file tomorrow so need tomorrow how to solve this project. In this practical data science assignment there are two options to choose modelling the data please choose CLASSIFICATION . so I can read understand and explain.

len2327790

6/25/2019 2:42:07 AM

Presentation includes:- The presentation should briefly describe your chosen data set, state the hypotheses/questions that you were investigating, then explain what the analysis and results were. The presentations are a maximum of 3 minutes per group, and we suggest each group to have at most 3 slides, and submit it in Canvas. The main reason for doing this is to save some time to connecting computers between presentations, and you can safely update your submission later before the deadline. Or you can print them out on A4 paper, to put on the document camera for presentation (to save time connecting computers between presentations). This assignment should be carried out in groups of two. It is up to you to form a team.

len2327790

6/25/2019 2:42:00 AM

General Requirements - This section contains information about the general requirements that your assignment must meet. Please read all requirements carefully before you start. You must do all modelling in IPython. You must include a plain text file called \readme.txt" with your submission. This file should include your name(s) (if you are a group of two) and student ID(s), and instructions for how to execute your submitted script files. This is important as automation is part of the 6th step of data science process, and will be assessed strictly. Parts of this assignment will include a written report, this must be in PDF format. Please ensure that your submission follows the file naming rules specified in the tasks below. File names are case sensitive, i.e. if it is specified that the file name is gryphon, then that is exactly the file name you should submit; Gryphon, GRYPHON, griffin, and anything else but gryphon will be rejected.

len2327790

6/25/2019 2:41:50 AM

Your report.pdf file at most 12 (in single column format) pages (including figures and references) with a font size between 10 and 12 points. Your presentation slides. The \readme.txt": includes your names and student IDs, and instructions for how to execute your submitted script files. They must be submitted as ONE single zip file, named as your student numbers (for example, 1234567 7321283.zip if your student ID are s1234567 and s7321283). The zip file must be submitted in Canvas: Assignments/Assignment 2. Please do NOT submit other unnecessary files.

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd