Reference no: EM132388550
STAT 362 Monte Carlo Methods Assignment, Stanford University, USA
Instructions: you should submit one file (either a .docx file or a .pdf file). The first part of the file should include your answers to the questions and any relevant output. After your solutions, please include all your code you wrote to answer the questions. Please clearly label each answer.
1) The Washington D.C. government maintains data on taxi trips taken within the city. The files TaxiPrice.txt, TaxiDetail.txt, and MoreTaxi.txt all contain portions of the data for trips taken during the month of February in 2017.
The variables in these data sets include:
- ID: A unique identifier for the trip.
- Provider: Taxi company that provided the trip.
- Meter: Meter fare paid by the customer.
- Tip: Amount the customer tipped.
- Surcharge: Surcharge fee paid by the customer.
- Extras: Any extra fees paid by the customer.
- Tolls: Toll amount paid by the customer.
- PaymentType: Customer payment type (Cash or Credit Card).
- PaymentProvider: Credit Card provider. If PaymentType is Cash, this will also be Cash.
- PickUpZip: Zip Code in which the customer was picked up.
- DropOffZip: Zip Code in which the customer was dropped off.
- Mileage: Miles traveled.
- TripTime: Length of the trip, in minutes.
- Airport: Y if the pick up or drop off location is an airport, N otherwise.
You only need to print your data set and show your output for parts (d) and (e).
a) TaxiPrice.txt and TaxiDetail.txt contain information on the same 90 observations. However, they share only two variables in common. ID and Mileage. Begin by importing both data sets into R. Then, create one R data frame that merges the data from these two data sets. Use the View command to check this (it may be difficult to review the data set in the console). Describe your method in one sentence.
b) MoreTaxi.txt contains 89 observations different from those in TaxiPrice.txt and TaxiDetail.txt. Load this data set into R and create one new data frame by stacking this data set onto the one created in part (a). Describe your method in one sentence.
c) Add a variable to the data frame in part (b) called TotalCost. TotalCost should be the sum of the Meter, Tip, Surcharge, Extras, and Tolls variables. Describe your method in one sentence and check to see if it worked by showing the sum of the first variables values.
d) Sort your data by the TripTime variable, then by the TotalCost variable, in ascending order. Print the resulting R data frame and present the first 10 rows of your output.
e) Use subsetting to create a data frame such that:
- It only consists of trips provided by Yellow Cab of DC or United Ventures.
- Only the Provider, PaymentProvider, TripTime, Mileage, and TotalCost variables are retained.
Print the resulting data frame and present the first 25 observations.
2) The data set Crayons.txt contains information on standard Crayola crayon colors. The variables in this file are crayon number, color name, hexadecimal code, RGB triplet, pack added, year issued, and year retired.
a) First, import the data set into SAS. Then, using PROC CONTENTS, review the names, labels, and attributes of the variables in the data set. Record the label and length for the variable Color as your answer to part (a).
b) Sort the crayon data by RGB Triplet and present the first 22 observations of the sorted data set. Only display the variables color name and RGB Triplet in this table. Hint: consider a SORTSEQ option.
c) Now sort the data by which pack they were added to (least to greatest). Present only the color and the pack added variables and only present the first 24 observations.
d) Subset the sorted data set constructed in part (c) so that only the crayons packed in the 24 pack are presented. Print the entire subset and present in your solutions.
e) Create a subset of crayons that are either retired or created on or after 1990. Print the entire subset and present it in your solutions.
3) A local company has recently updated human resources information on their employees from a paper-based system into an actual database where the data can be manipulated and reviewed more efficiently. The raw data file Employees.dat contains information on employees including a de-identified Social Security number, name, date of birth, pay grade, monthly salary, and job title.
a) Import the data set into SAS using code. Print your data set and report the first 10 observations. Ensure the date of birth variable is a readable date in your output and not a SAS date value.
b) Create a new variable for each employee's age and report it as a whole number. Present a subset of the data set that includes the first 15 observation's names and ages.
c) For each employee, calculate the expected annual salary for the next year assuming that they each receive a 2.5% cost of living increase in January. The resulting variable should be rounded to two decimal places.
d) Create a bonus variable of $1,000.00 for employees who are leads, managers, or directors, and $0 otherwise.
4) The table below presents the names and the dates of birth of the four members of the Beatles.
Beatle Name
|
Birth date
|
Ringo Starr
|
07/07/1940
|
John Lennon
|
10/09/1940
|
Paul McCartney
|
06/18/1942
|
George Harrison
|
02/25/1943
|
a) Create an R data frame for to present this information exactly as shown. Present the R table.
b) Create a new variable that presents the day of the week each Beatle was born. Present the new R table.
c) Create a new variable that presents each member's age on the date that their album Abbey Road was released (09/26/1969). Present the full R table.
d) Use condition logic to present a final column to your data frame which either includes their current age or the age when they passed away (note Ringo and Paul are still alive). Present the full R table.
Remember to copy all the code you used to answer the questions. Separate and label the code per question.
Attachment:- Monte Carlo Methods Assignment Files.rar