SIT742 Modern Data Science Assignment

Assignment Help Other Subject
Reference no: EM132487084

SIT742 Modern Data Science - Deakin University

ASSESSMENT TASK ONE

DATA EXPLORATION: DATA SCIENTISTS SURVEY

Background
In 2017, Kaggle (a data science community and competition platform) conducted a survey on a large range of users registered as the data scientist in their platform. The survey data are broadly covered the skill set of the data scientists, the demographic of the data scientists, the feedback of the platform and many other information.

Task Description
We provide one Jupyter notebook 2020SIT742Task1.ipynb at GitHub-SIT742, together with three data files at the data subfolder:
MCQResponses.csv The csv file contains participants' answers to multiple choice ques- tions. Each column contains the answers of one respondent to a specific question.
ConversionRates.csv Currency conversion rates to USD.
JobPostings.csv Data scientists job advertising in US with job descriptions, from JobPikr.

You are required to develop a data exploration report by completing the provided Jupyter notebook to finish some required analysis, with the exploration data analytics skills as well as visualization skills. Details requirements can be found in the provided notebook, and you need follow the notebook requirements to complete the coding and include the results into the report SIT742T1Report.pdf.

Data Exploration
For a data scientist, after obtaining the dataset, the first most crucial task is to obtain a good understanding of the data he or she is dealing with. This includes: examining the data attributes (or equivalently, data fields), seeing what they look like, what is the data type for each field, and from this information, determining suitable numerical/visual descriptions.
In this part of this assessment task, you need to complete the provided notebook coding parts and finish the required analysis in the attributes such as ‘education', ‘salary' and related demographic information (70%).

Text analysis
For the job advertisement data JobPostings.csv, you are required to write Python code to remove the stop-words, and to extract the high frequency words used in job advertisements.
advertisement information (30%). After that, you can do one self-defined text analysis task to get insight into those

ASSESSMENT TASK TWO DATA ANALYTICS: FIFA 2019

Background
Recently, Kaggle (a data science community and competition platform) released one data set FIFA19, which consists of 18K+ FIFA 19 player with around 90 attributes extracted from FIFA database. Here, we redistribute this data set for this assessment task:
2020T2Data.csv The file contains detailed information about each FIFA 19 player.

Task Description
We provide one Jupyter notebook 2020SIT742Task2.ipynb together with 2020T2Data.csv at the data subfolder.

You are required to analyse this dataset using Jupyter notebook with Spark packages including spark.sql and pyspark.ml.

FIFA19 Data Analytics
following 3 kinds of analysis : To systematically investigate this dataset, your Jupyter notebook should complete the
Part 1 - Exploratory Data Analysis data visualization and understanding.
Part 2 - Clustering Analysis Identify the inherent clusters among players, and for each cluster, identify its profile.
Part 3 - Classification Analysis Build classifiers to predict the ‘position_group' of the player. You are also required to evaluate the performance of at least 3 models using cross-validation.

Project Report

write a report SIT742T2Report.pdf with 1000 1500 words, which should include the Based on your implementation as required in Jupyter notebook, you are required to following information:

(1) The required report ‘Section 1' to ‘Section 3' (results and analysis) as specified in the notebook.

such as any rising star? any omni player? etc. (10%) (2) In the report's ‘Section 4', discuss any findings you can reveal from this data set,
(3) In the report's ‘Section 5', reflect the project group activities, such as the task during this project. (10%) distribution and contributions from each group members, and what you have learnt.

Attachment:- Modern Data Science.rar

Attachment:- Data.rar

Reference no: EM132487084

Questions Cloud

Describe the week content and resources to a person : The objective of this Assignment is to provide you with a private place to think on the page; "thinking on the page" is a phrase used to describe writing as a.
What does universal design mean : What does universal design mean? Find at least 2 online definitions and put those definitions into your own words and write your definition here.
What the artifact you selected says about you : What the artifact you selected says about you. Did you share a video of the music or the lyrics? Did you show a photograph or a painting?
Explain why the project failed : Project management is essential for the operations in various industries such as information technology, hospitality, engineering, and others.
SIT742 Modern Data Science Assignment : SIT742 Modern Data Science Assignment help and solution, Deakin University - assessment writing service - develop a data exploration report by completing
Discuss who thinks about birth control : Who thinks about birth control?Discuss Who knows when you're down to 1 roll of paper towels or toilet paper? Who does meal planning? Grocery shopping, cooking?
What potential ethical risks does entry-level manager pose : What potential ethical risks does an entry-level manager pose when asked to take a leadership position leading a new role as part of an international expansion.
Discuss information system solutions that applied to issue : Based on your reading of the case study "AbbVie Builds a Global Systems Infrastructure" on pages 586 of the textbook, discuss the problems that the company was.
Describe the various styles of leadership behaviors : The objective of this paper is for you to understand and be able to apply various styles of leadership behaviors into your own leadership style and approach.

Reviews

Write a Review

Other Subject Questions & Answers

  Addresses the importance of reading of the miranda rights

one that addresses the importance of the reading of the miranda rights,regardless of the circumstances and, one that addresses the reason why the miranda rights should not be read to a suspected terrorist.

  Discuss hinduism is known for various type of yoga

The caste system has been prevalent throughout the history of Hinduism, Discuss Hinduism is known for various type of yoga

  What extracurricular activities or organizations

What extracurricular activities or organizations might you be able to be involved with to help bridge this gap, even when you are not "on duty"?

  Key risks and the potential rewards of product placement

Why are the key risks and the potential rewards of product placement in both movies and television? Do you think this promotional tool will continue to grow? Why or why not?

  Write a sentence with the subordinate conjunction

Write a sentence with the subordinate conjunction after and using the words advertising and accommodate. What parts of speech do the words advertising and accommodate play in the sentence?

  Background distressor keeps her in what stage of gas

Jane lives in a finished basement that frequently has problems with mold. This background distressor keeps her in what stage of GAS ?

  Explain the concepts of stare decisis and res judicata

Explain concepts of stare decisis and res judicata. When does an executive order have the effect of law? What is function of the judicial branch of government

  Research paper-the media and the african or black community

The final paper is a 1300 word research paper that discusses one of the following topics: The transatlantic Slave Trade and the Economic Development of the American South and The media and the African/Black Community

  Media reporting of terrorist activities

Review at least four popular media sources—newspaper, radio, television, the Internet, and so on—and review the reports of terrorist organization activities and events.

  Briefly describe elements of a transformational leader

briefly describe elements of a transformational leader. what percentage of transformational leaders do you believe

  What are the characteristics of the community

With the use of public transportation or by driving a vehicle around the community, you can assess the common characteristics of the community of your selected.

  Summarize the statistics from the last two reporting years

Summarize the statistics from the last two reporting years. Be sure to include demographic information such as ethnicity, race, age, gender and marital status.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd