Obtain a Pearson correlation matrix relating variables count

Assignment Help Other Subject
Reference no: EM132374126

Statistics for Data Science Assignment - Capital BikeShare

Bike sharing systems are a new generation of bike rentals where the whole process from membership, rental and return has become automatic. Through these systems, a user is able to easily rent a bike from a particular position and return the bike at another position. Currently, there are over 500 bike-sharing programs around the world, with some of the best and largest found in Hangzhou (China), Paris (France), London (England), New York City (US) and Montreal (Canada). Great interest in these systems exists due to their role in addressing traffic congestion, environmental impact and population health issues in big cities.

The data for this assignment comes from one such program, called Capital Bikeshare, operating in Washington in the US. It has over 3000 bicycles that can be rented from over 350 stations across Washington, D.C., Arlington and Alexandria, VA and Montgomery County, MD. Their website encourages users to check out bikes for a trip to work, to run errands, go shopping, or visit friends and family. Users can join Capital Bikeshare for one to three days (casual membership), or for a month or a year (registered membership). Access to the Capital Bikeshare fleet of bikes is available 24 hours a day, 365 days a year. The first 30 minutes of each trip are free.

You will use data derived from Capital Bikeshare trip records to build a statistical model for the purposes of predicting the number of rentals per day.

References and Data Sources:

1. Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository Irvine, CA: University of California, School of Information and Computer Science.

2. Fanaee-T, Hadi, and Gama, Joao, 'Event labeling combining ensemble detectors and background knowledge', Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg.

Data files for this assignment:

The main data file for this assignment is called daily.sas7bdat and contains daily counts of bike rentals for 2011 and 2012, derived from Capital Bikeshare trip history data, with additional weather and seasonal information. The data was downloaded from the UCI Machine Learning Repository. Variables in that file are as follows:

Variable

Description

instant

Record index

dteday

Date

season

Winter, spring, summer or fall (northern hemisphere)

yr

0 = 2011, 1 = 2012

month

Month (January to December)

weekday

Day of the week (Monday to Sunday)

workingday

Working day = 1, weekend or public holiday = 0

temp

Normalised temperature in degrees Celsius; observed temperature divided by 41 (max)

atemp

Normalised 'feels like' temperature in degrees Celsius; values divided by 50 (max)

hum

Normalised humidity; observed values divided by 100 (max)

windspeed

Normalised wind speed; values divided by 67 (max)

casual

Count of casual users

registered

Count of registered users

count

Total count of bike rentals (casual plus registered)

The second file for this assignment is called random_sample.xlsx and it can be downloaded from the Data Files folder on the course website. The file contains a stratified sample of bike rentals taken from the Capital Bikeshare trip history data for the second quarter of 2012. Variables in that file are as follows:

Variable

Description

Duration

Trip duration, in seconds

Start_date

Date and time stamp for the beginning of the trip

Start_station

Address for the location from which the bike was rented

End_date

Date and time stamp for the end of the trip

End_station

Address for the location to which the bike was returned

Bike_number

Bike identification number

User_type

Type of user (casual or registered)

Assignment tasks:

Question 1 -

(a) Use SAS to study the distribution of the total daily number of rentals. Obtain measures of location, dispersion, skewness and kurtosis. Obtain a boxplot, histogram and a quantile-quantile plot. Also carry out Normal goodness-of-fit tests. What are the key features of this distribution?

(b) Now use SAS to obtain boxplots of the total daily number of rentals according to season and by type of day (working day vs weekend or public holiday). What do the boxplots suggest about the pattern, if any, of bike rentals?

(c) In 2012, the east coast of the United States was struck by Hurricane Sandy. Is this severe weather event evident in your results? Provide a relevant graph to support your answer.

Question 2 -

(a) Obtain a Pearson correlation matrix relating variables count, atemp, temp, hum and windspeed. Also obtain a scatterplot matrix of the same variables. Discuss the relationships.

(b) Fit a simple regression model relating count to atemp, with count as the dependent variable, and determine the residuals from this regression. Discuss the fitted relationship and the goodness of fit. Examine residual plots and influence diagnostics and comment on the residual behaviour.

(c) Obtain a correlation matrix relating the residuals from part (b) to variables temp, hum and windspeed. Comment on these correlations. What do they tell you about the importance of these variables for predicting the daily count of bike rentals?

(d) Using the correlations in part (c) identify a set of potential explanatory variables. Regress count on your selection of variables. Discuss the fitted relationship and the goodness of fit. Also examine and discuss residual patterns.

(e) Extend your multiple regression model from part (c) to include categorical predictors. You can use stepwise selection to help you find the most parsimonious (simplest) model with the highest R-square. Report and interpret in detail only your final model, but do indicate how it was obtained and why it was considered the 'best'.

In building your model consider as many potential explanatory variables as possible (you may need to define additional dummy variables). Be sure to check, and if necessary correct, for collinearity.

Question 3 -

(a) Upload the data file random_sample.xlsx into a folder of your choice in your home directory on the SAS server and then use the import procedure to convert the data file into a SAS table. The code snippet shown below assumes that the Excel data file was uploaded directly to the home directory in SAS Studio, and proc print is used to check that the data was converted correctly into SAS format:

(b) Is there a statistically significant difference in duration of bike trips by casual versus registered users? If so, which trips are typically longer? Check the necessary conditions and perform an appropriate hypothesis test. Should it be a two-sample or a paired t-test? You may need to use a transformation (e.g. log) in order to justify performing a t-test on this data. Justify your choices and discuss your results.

Question 4 -

Write a summary of your findings from Questions 1 to 3. Keep the technical details of the analyses that led you to these conclusions to the absolute minimum. Rather, focus on practical significance and present your findings in non-specialist terms. A few paragraphs (up to a page) will be sufficient.

Note - Please include screenshots of SAS graphs where needed, followed with texts to explain them, according to the questions, thank you very much! There is no need to answer/explain graphs if the questions do not state so.

Attachment:- Statistics for Data Science Assignment File.rar

Reference no: EM132374126

Questions Cloud

What is multiculturalism : What is multiculturalism with respect to technology and information access.
Looking for information on what to do post graduation : What is the best way to recruit and screen group members for a therapy group for high school students that are soon to be graduates and their parents
Understanding of the group process : How often should we meet and for how long? Just trying to get a better understanding of the group process.
Test score difference occurring by chance : ''What does this mean about the probability of this test score difference occurring by chance''?'' Is it less than 0.05''?
Obtain a Pearson correlation matrix relating variables count : MATH 4044 - Statistics for Data Science Assignment - Capital BikeShare. Obtain a Pearson correlation matrix relating variables count
Evaluate client satisfaction with services : How would you go about planning a process to evaluate client satisfaction with services?
Article on lack of education : Looking for an article on " lack of education" where lack good critical thinking skills are being demonstrated by the author or speaker.
Three good critical skill from the article : Please provide at least three good critical skill from the article.
What is meant by the utility of a test : What is meant by the utility of a test? What are factors that affect a test's utility?

Reviews

len2374126

9/21/2019 4:16:46 AM

Please include screenshots of SAS graphs where needed, followed with texts to explain them, according to the questions, thank you very much! There is no need to answer/explain graphs if the questions do not state so. Instructions: This assignment is worth 25% of your final grade. It is due no later than 11pm on Sunday 22 September, at the end of Week 8. You will need to submit your assignment via Learnonline. There is no need to include a cover sheet as it is generated automatically by Learnonline system.

len2374126

9/21/2019 4:16:40 AM

The submitted assignment needs to be a single file, in either a Microsoft Word (doc or docx) or pdf file format. The assignment is out of 120 marks. To achieve maximum marks for each question, you should aim to: Complete the requested statistical analysis in SAS using appropriate tasks or procedures. (40%) Provide and interpret only the output most relevant to the question. Do not include every piece of output produced by SAS! (40%) Discuss the results in the context of the question. (20%) Assignments submitted late, without an extension being granted, will attract a penalty of 10 marks per each day or any part thereof beyond the due date and time.

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd