Draw a bar chart to compare the artists of the songs

Assignment Help Other Subject

Reference no: EM133144584

BUS5DWR Data Wrangling and R - La Trobe University

Overview

Assignment Requirements

Part 1
The given data files Movie.csv, Rating.csv and Continent.csv record the information about the IMDB movie ratings.

Write R code in an Rmd file to answer the following questions. Each question should be presented in one code chunk:

Load the dataset from the given files into three data frames called Movie, Rating, and Continent. Rename columns to remove space if they exist. (Hint: use str_replace_all to do this automatically for all columns). Remove the column Writer in the Movie dataframe. Display the summary of each dataframe.

How many movies produced by 'Universal Pictures' have the actor 'Arnold Schwarzenegger'?

Display the five most-reviewed movies that belong to both Action and Drama. Display only the Title and the number of reviews.

Display movie rating information including Title, average rating and two new columns (1) 'TotalVote' showing the total votes from both males and females and (2) 'Popular' showing 'Male' for movies with the MalesTotalVotes greater than FemalesTotalVotes and 'Female' otherwise. (Hint: see Workshop 9 exercise). Show only TEN movies with the highest average rating.

Display the number of Comedy movies and their average rating from each continent.

Analyse the distribution of the average rating of all the movies after the year 2000. (Hint: draw a boxplot and histogram and write a short paragraph (less than 100 words) to describe your insight).

Part 2
The given Spotify.xlsx file records the summary of Australia's top 200 daily-streamed songs (or tracks) in the first three months of 2017 and 2018. The Data worksheet records the total streams and the highest position of each song in each month. You will see that the data is far from being ready for analysis and needs to be 'wrangled'. The given Artist.csv file records the artists who perform the songs. You are required to write R code to perform the following steps.

Load the data from the Spotify worksheet into a dataframe named Spotify. Replace the space in the column name with an underscore ("_"). Show the structure of Spotify.

You can see that most column names contain the month information, which should be placed as row values. Let:

• Use pivot_longer to transform the dataframe into four columns, namely Artist_ID, Track_Name, Month, and Value.
• Drop all rows having NA in Value.
• Split the Month column into Month and Year
• Display the number of columns and rows.

You can see that the data in column Value contains both the total stream and highest position of the song in the corresponding month. Note that the smaller value of the position, the higher the position.

• Split the Value column into two columns with appropriate names.
• For each month-year, show the total streams and the number of songs appearing in the daily top 200.

Find all tracks that appeared in all six months with each monthly stream more than 100,000. Display their name, total stream and highest position. Export the result into a CSV file.

Load the data from the Artist.csv file into a new dataframe. Rename the columns to remove spaces. How many artists do not have songs listed in the Spotify dataframe?

Draw a bar chart to compare the artists of the songs/tracks returned in Q2.4 based on their total stream. Order the bar from the highest to the lowest total stream. Write a small paragraph describing your insight got from this chart.

Attachment:- Data Wrangling and R Assignment.rar

Reference no: EM133144584

Questions Cloud

How is data shared with the public : How is data shared with the public (students, parents, community members) of Colorado Springs District 22? Is the current dissemination of data is effective?

How would you define ethical research : How would you define ethical research? What criteria does a research study need to meet in order to be considered ethical?

Prepare an incremental analysis for the special order : In September, Caldwell Company receives a special order for 25,000 machines at $120 each. Prepare an incremental analysis for the special order

Mergers and acquisitions in foreign markets : Mergers and acquisitions in foreign markets has increased over the past decades.

Draw a bar chart to compare the artists of the songs : Analyse the distribution of the average rating of all the movies after the year 2000 and Draw a bar chart to compare the artists of the songs

Temporary insurance agreement : You have just completed an insurance application with your new client. You have reached the point where you are reviewing the Temporary Insurance Agreement (TIA

Introduces tableau for data visualization : This course introduces Tableau for data visualization. You as the subject matter expert (SME) are asked to support or offer an alternative solution to Tableau.

What is the approximate amount of life insurance : Assuming an annual investment return of 4.5% and an average annual rate of inflation of 2.5%, what is the approximate amount of life insurance Xue needs

Initially considered only as means of securing market access : Initially considered only as means of securing market access, alliances today are an integral part of global strategies in all parts of the value chain.

User Account

All Pages