COMP20008 Elements of Data Assignment

Assignment Help Other Subject
Reference no: EM132494303

COMP20008 Elements of Data Assignment help and solution, Processing Project - assessment writing service - University of Melbourne

Learning outcome 1: To gain practical experience in written communication skills for data science projects.
Learning outcome 2: To practice a selection of processing and exploratory analysis techniques through visu- alisation discussed in lectures and workshops.
Learning outcome 3: To practice crawling and scraping data from the Internet.
Learning outcome 4: To practice using widely used Python library for data processing and gain experience using library functions which may be unfamiliar and which require consultation of ad- ditional documentation from resources on the Web.

Your Parts

You are to perform a small data science project including some data processing and analysis using Python. Your responses to Parts 1-5 must be contained in a single .py file. Specifically, you have the following Parts:

Part 1

Produce a csv file containing the URL and headline of each the articles your crawler has found found. The CSV file should have two column headings url and headline and be called Part1.csv.

Note: You might want to start with a smaller website to test your crawling implementation with this site

Part 2
For each article found in Part 1,

a) extract the name of the first player mentioned in the article. You can find a list of player names as part of the tennis.json file provided. We will assume the article is written about that player (and only that player).

b) extract the first complete match score identified in the article. You will need to use regular expressions to accomplish this. We will assume this score relates to the first named player in the article.

Produce a csv file containing the URL, headline, first player mentioned and first complete match score of each the articles your crawler has found. The csv file should have four column headings url, headline, player and score and be called Part2.csv.

Note: Some articles may not contain a player name and/or a match score. These articles can be discarded.

Part 3

For each article used in Part 2, identify the absolute value of the game difference. E.g. a 6-2 6-2 score has a game difference of 8, while a 6-4 4-6 6-4 score has a game difference of 2. The value is referred to as the game difference

Produce a csv file containing the player name and average game difference for each player that at least one article has been written about. The csv file should have two column head- ings player and avg game difference and be called Part3.csv.

Part 4

Generate a suitable plot showing five players that articles are most frequently written about and the number of times an article is written about that player.
Save this plot as a png file called Part4.png

Part 5

Generate a suitable plot showing the average game difference for each player that at least one article has been written about and their win percentage. You can find a player's win percentage in the tennis.json file.
Save this plot as a png file called Part5.png

Part 6

Write a 3-4 page report to communicate the process and activities undertaken in the project, the analysis, and some limitations. Specifically, the report should contain the following infor- mation:
• A description of the crawling method and a brief summary the output for Part 1.

• A description of how you scraped data from each page, including any regular expressions used for Part 2 and a brief summary of the output.
• An analysis of the information shown in the two plots produced for Parts 4 & 5, in- cluding a brief summary of the data used. The plots are to be shown (included) along with your analysis.
• A discussion of the appropriateness of associating the first named player in the article with the first match score.
• At least one suggested method for how you could figure out from the contents of the ar- ticle whether the first named player won or lost the match being reported on.
• A discussion of what other information could be extracted from the articles to bet- ter understand player performance and a brief suggestion for how this could be done.

Attachment:- Elements of Data.rar

Reference no: EM132494303

Questions Cloud

What is intuit total net revenue growth : How would these growth rates affect your projection of Intuit's 2017 income statement? What is Intuit's total net revenue growth during 2016?
What is the current value of operations : If the company's weighted average cost of capital is 11 percent, what is the current value of operations, to the nearest million? (Hint: Please consider FCF0
Explain the geico total rewards program : Evaluate the effectiveness of the communication of Geico's total rewards program based upon the Website's descriptions of the benefits. Recommend two (2) areas.
What is the correct amount of his self- employment tax : During the tax year 2018, he had a net profit of $ 150,000. What is the correct amount of his self- employment tax
COMP20008 Elements of Data Assignment : COMP20008 Elements of Data Assignment help and solution, Processing Project - assessment writing service - University of Melbourne
What is the highest expected portfolio return : What is the highest expected portfolio return Brain can earn on his complete portfolio?
What amount of interest cost should harbor capitalize : Harbor incurred interest of $20,000 on specific construction debt, and $60,000 on other borrowings. What amount of interest cost should Harbor capitalize
Why are competency frameworks important : Public health is important work and the people who carry out that work contribute substantially to the health status and quality of life of the individuals.
Compute forecast the company cost of sales : The company anticipates that sales will increase by 2% in 2018 but that the gross profit margin will be the same as 2017. Forecast the company's 2018

Reviews

Write a Review

Other Subject Questions & Answers

  Describe the chaebol organisational style

Describe the chaebol organisational style? How does this effect management practices? How might culture - both organisational as well as national culture affect

  How humanistic and existential theories affect

Analyze how humanistic and existential theories affect individual personalities. Explain how humanistic and existential theories influence interpersonal relationships

  How is brain development related to fetal behavior

How is brain development related to fetal behavior? What implications do individual differences in fetal behavior have for the baby's temperament after birth?

  Effectively communicates the cultural-values

Select a movie or TV show that effectively communicates the cultural, values and norms of a society that is different from your own culture. Ideally, this movie or TV show would be a foreign film with subtitles

  How confident are you in your evaluation

Thinking back to our discussion in the chapter section, Caveat Emptor-Be An Informed Consumer, evaluate whether the replacement of highly paid workers.

  Evaluate current ethical issues and decisions

Evaluate current ethical issues and decisions affecting the field of education;

  Outline an internet policy for your context

Outline an Internet policy for your context in which you address each of the critical issues identified by Smaldino et al.

  Prepare the calculator code in assembly language

Prepare the code in assembly language and commented - Assembly for Atmega16 and do 8 digits calculation using an LCD and 4x4 keypad

  Define what are the benefits of doing a pilot program

What are the benefits of doing a pilot program before a full scale rollout of a new analytical methodology? Discuss this in the context of the mini case study.

  Discuss about the recent media-related news item

Each posting must reflect on a recent media-related news item. You may do this by submitting a post summarizing and reacting to an item.

  Difference between a nomenclature and classification system

Explain the difference between a nomenclature and a classification system. Choose one element - either the difference between CPT, HCPCS and ICD-10-CM, or the difference between nomenclature and classification system.

  What is euthanasia

Identify and describe laws regarding euthanasia in your state. What is euthanasia

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd