Reference no: EM132373819
Assignment -
Part 1 - BBALL STUDY
We previously used a dataset called PlayerBBall.csv which contained information about NBA basketball players. To finish that assignment, you had to manipulate the height column. Review the code you used to do that and see if you can't make more efficient code using regular expressions and / or the string functions from this Unit.
- Use regular expressions to use the height column to create a TotalInches column that has the total height in inches and is recorded as a numeric variable.
- Use this variable to make a chart that contains histograms of heights for every position (color coded).
Part 2 - FIFA STUDY
We previously used a dataset called FIFA Playersl.csv which contained information about Soccer players.
a. Use the string functions and regular expressions to assess a relationship between height and weight among soccer players. To do this you will need to manipulate the height and weight columns into columns that have numberic values of the height and weight. Tell you story using 2 - 4 PPT Slides.
b. Next, assess this relationship between just the LB and LM positions. (1 slide should do it.)
BBALL STUDY - We previously used a dataset called PlayerBBall.csv which contained information about NBA basketball players. To finish that assignment, you had to manipulate the height column. Review your code and see if there isn't a more efficient solution using regular expressions and / or the string functions from this unit. Tell your story on 1 or 2 PPT Slides.
- Use regular expressions to use the height column to create a TotalInches column that has the total height in inches and is recorded as a numeric variable.
- Use this variable to make a chart that contains histograms of heights for every position (color coded).
Part 3 - BABY NAMES
Backstory: Your client is expecting a baby soon. However, he is not sure what to name the child. Being out of the loop, he hires you to help him figure out popular names. He provides for you raw data in order to help you make a decision.
The Most Popular Baby Names in The UK
|
Girls
|
Boys
|
Olivia
|
Oliver
|
Amelia
|
Harry
|
Emily
|
George
|
Isla
|
Jack
|
Ava
|
Jacob
|
Isabella
|
Noah
|
Lily
|
Charlie
|
Jessica
|
Muhammad
|
Ella
|
Thomas
|
Mia
|
Oscar
|
Baby Names: Question 1
1. Data Munging: Utilize yob2016.txt for this question. This file is a series of popular children's names born in the year 2016 in the United States. It consists of three columns with a first name, a gender, and the amount of children given that name. However, the data is raw and will need cleaning to make it tidy and usable.
a. First, import the .txt file into R so you can process it. Keep in mind this is not a CSV file. You might have to open the file to see what you're dealing with. Assign the resulting data frame to an object, df, that consists of three columns with human-readable column names for each.
b. Display the summary and structure of df
c. Your client tells you that there is a problem with the raw file. One name was entered twice and misspelled. The client cannot remember which name it is; there are thousands he saw! But he did mention he accidentally put three y's at the end of the name. Write an R command to figure out which name it is and display it.
d. Upon finding the misspelled name, please remove this particular observation, as the client says it's redundant. Save the remaining dataset as an object: y2016.
Baby Names: Question 2
2. Data Merging: Utilize yob2015.txt for this question. This file is similar to yob2016, but contains names, gender, and total children given that name for the year 2015.
a. Like 1a, please import the .txt file into R. Look at the file before you do. You might have to change some options to import it properly. Again, please give the dataframe human-readable column names. Assign the dataframe to y2015.
b. Display the last ten rows in the dataframe. Describe something you find interesting about these 10 rows.
c. Merge y2016 and y2015 by your Name column; assign it to final. The client only cares about names that have data for both 2016 and 2015; there should be no NA values in either of your amount of children rows after merging.
Baby Names: Question 3
3. Data Summary: Utilize your data frame object final for this part.
a. Create a new column called "Total" in final that adds the amount of children in 2015 and 2016 together. In those two years combined, how many people were given popular names?
b. Sort the data by Total. What are the top 10 most popular names?
c. The client is expecting a girl! Omit boys and give the top 10 most popular girl's names.
d. Write these top 10 girl names and their Totals to a CSV file. Leave out the other columns entirely.
Baby Names: Question 4
4. Data Visualization: Create a well labeled, visually appealing and informative visualization summarizing some of the results of this study.
Attachment:- Data Files.rar