Reference no: EM133714097 , Length: word count:1200
Data manipulation
This section analyses US baby names from 1880-2010 to determine patterns in naming conventions. The data are available in in a file called ‘names.zip'. Download the file to a specific folder on your machine. If you unzip the file you'll see that each year is a comma-separated file with 3 columns - name, sex, and number of births.
Load in the file from 1960 using read.table or read.csv.
Produce appropriate commands to answer the following questions:
According to these data, how many children were born in 1960?
Which were the 10 most popular names for each sex?
Are there any names in the data set that were only given once? (Note: the readme file suggests names given less than 5 times should have been removed, but you should check this.)
Load in the file from 2000 in a separate data frame. Which names had the biggest rise/fall compared to 1960?
Write some R code which loads in all of the files for each year and merges them into a single data frame with 4 columns: year, name, sex, and number of births.
Data analysis
This section again uses the US baby names data from 1880-2010, introduced in Section 2.
Produce a table of the popularity of your name over each year. (If your name is not in the data set choose a similar name which is in the data set.) What year was the maximum for your name?
Create a table showing the total births by sex and year. Do males or females tend to have higher birth rates?
Create a table of the frequency of different last letters in names for years 1900, 1950 and 2000 for males and females. Which last letter(s) stand out as having the biggest increase/decrease?
Which are the most popular palindromic names? Calculate the proportion of palindromic names per year. Are such names on the increase?
Graphics
This section uses the US baby names data from 1880-2010, introduced in Section 2.
Create a line plot to illustrate the male and female birth rates across all years, on the same plot. Label your axes, add a title to your plot, and include a legend. Include the plot in your pdf report and write a one paragraph description of the plot.
Create a barplot to illustrate the frequency of different last letters in names for years 1900, 1950 and 2000 for males and females, on the same plot, but using a different panel for each sex. Label axes appropriately, add titles to your panels, and include legends. Include the plot in your pdf report and write a one paragraph description of the plot.
Create a plot to illustrate the proportion of palindromic names per year. Label your axes and add a title to your plot. Include the plot in your pdf report and write a one paragraph description of the plot.
Creativity
Do something interesting with the names data! Create a table or a plot which shows something we have not discovered above already. Make sure to include all R code in your script file and outline your findings in your pdf document.
Hints
If you find your computer is too slow at doing some of the calculations in tasks 2 and 3 then try running every 10th year instead of every year.