What is the meaning of conjugate prior

Assignment Help Other Subject
Reference no: EM132092020

Some of the questions in this assignment require you to use the "BikeShare" dataset. This dataset is given as a text file, named "BikeShareTabSep.txt". You can download this from the Assignment folder in CloudDeakin. Below is the description of this dataset.

Bike sharing dataset (BikeShare)
This dataset gives the count of bikes rented between 11am - 12pm on different days and locations through the Capital Bikeshare System (operating in US cities) between 2011 and 2012. The variables include the following (9 variables):

Season: Categorical: 1 = Spring, 2 = Summer, 3 = Autumn (fall), 4 = Winter

Working day: 0 = Weekend, 1 = Workday.

Weather: Categorical variable
1: Clear, Few clouds, Partly cloudy, Partly cloudy
2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered cloud
4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

Temperature: Temperature in Celsius.

`Feeling' Temperature: `Feels like' temperature, reported in Celsius.

Humidity: Humidity (given as a percentage).

Windspeed: Windspeed (measured in km/h).

Casual users: Count of casual users that used a bike at that time.

Registered users: Count of registered users that used a bike at that time.

Assignment tasks

Q1):

• Download the txt file "BikeShareTabSep.txt" and save it to your R working directory.
• Assign the data to a matrix, e.g. using

the.data<-as.matrix(read.table("BikeShareTabSep.txt"))

• Generate a sample of 400 data using the following:

my.data <- the.data [sample(1:727,400),c(1:9)]

Save "my.data" to a text file titled "name-StudentID-BikeShareMyData.txt" using the following R code (NOTE: you must upload this text file with your submission).

write.table(my.data,"name-StudentID-BikeShareMyData.txt")

Use the sampled data ("my.data") to answer the following questions.

Draw histograms for ‘Registered users' and ‘Temperature' values, and comment on them.

Give the five number summary and the mean value for the ‘Casual users' and the ‘Registered users' separately.

Draw a parallel Box plot using the two variables; ‘Casual users' and the ‘Registered users'. Use the answers to Q1.2 and the Boxplots to compare and comment on them.

Draw a scatterplot of ‘Temperature' and ‘Casual users' for the first 200 data vectors selected from the "my.data" (name the axes) and comment on them.

Fit a linear regression model to the ‘temperature' (as x) and the ‘casual users' (as y) using the first 200 data vectors selected from the "my.data". Write down the linear regression equation. Plot the line on the same scatter plot. Compute the correlation coefficient and the coefficient of Determination. Explain what these results reveal.

Q2)

The table shows results of a survey conducted about the type of vehicle people own (in thousands) in different states over a five year period between 2011 and 2016.

 

State

New south Wales (N)

Victoria (V)

Queeensland (Q)

Total

Vehicle type

Passenger (P)

1360

1140

810

3310

Light commercial (C)

260

190

240

690

Total

1620

1330

1050

4000

Suppose we select a person at random,

What is the probability that the person is from Victoria (V)?

What is the probability that the person owns a light commercial vehicle (C)?

What is the probability that the person owns a passenger vehicle (P) and from New South Wales (N)?

What is the probability that the person owns a light commercial vehicle (C) given that he/she is from Queensland (Q)?

What is the probability that the person, who owns a passenger vehicle is from Queensland (Q)?

What is the probability that the person is from Victoria (V) or owns a passenger vehicle (P)?

find the marginal distribution of the vehicle type

find the marginal distribution of the state

find the conditional distribution of vehicle type within each state.

Q3)

Suppose that 20% of the adults smoke cigarettes. It is known that 60% of smokers and 15% of non-smokers develop a certain lung condition. What is the probability that someone with the lung condition was a smoker?

Q4) Maximum Likelihood Estimation (MLE)

The number of cars xi arrive at a shopping centre on a given day i is modelled by a Poisson distribution with unknown parameter θ as given by the following equation.

xi ~ Poid(θ)

Poid(θ) = p(xi|θ) = θxie/xi!

Assume that we consider N consecutive days, and the cars arrive at the shopping centre are independently and identically distributed (iid).

a) Show that the expression for the likelihood (joint distribution) p(X|θ) of the arrival of cars for N days (X = {x1, x2, ... , xN}) is given by

p(X|θ) = θNx¯e-Nθ/x1i!x2!x3!....xN!,

where x¯ = 1/N∑i=1Nxi

b) Find an expression for the logliklihood function L(θ) = ln (p(X|θ))

c) In order to find the Maximum likelihood Estimation (MLE) of parameter θ, we need to maximize the L(θ).

Find the value of θ that maximises L(θ) by differentiating the log likelihood function L(θ) with respect to θ and equating it to zero. Show that the Maximum likelihood Estimate θ^ (MLE) of parameter θ is given by:

θ^ = x¯,  where x¯ = 1/N∑Ni=1xi

d) Suppose that we observe the number of cars arrived on the three days as x1 = 100, x2 = 60 and x3 = 70.

What is the MLE given this data?

Q5) Bayesian inference for Gaussians (unknown mean and known variance)

What is the meaning of conjugate prior?

Why conjugate priors are useful in Bayesian statistics?

Give three examples of Conjugate pairs (i.e., give three pairs of distributions that can be used for prior and likelihood)

The annual rainfall received at the Murray basin are measured for n years. The average rainfall observed over the n years is 1100 mm. Assume that the annual rainfall are normally distributed with unknown mean θ and known standard deviation 200 mm. Suppose your prior distribution for θ is normal with mean 800 mm and standard deviation 100 mm.

a) State the posterior distribution for θ (this will be in terms of n. Do not derive the formulae)
b) For n=3, find the mean and the standard deviation of the posterior distribution. Comment on the posterior variance
c) For n=15, find the mean and the standard deviation of the posterior distribution. Compare with the results obtained for n=3 in the above question Q5.4(b) and comment.

Q6) Dimensionality Reduction:

Use the "BikeShare" data for this question. Use the following code to load randomly selected 200 (or 100) data points. Note that only features from 4 to 9 are used here.
the.data <- as.matrix(read.table("BikeShareTabSep.txt"))
selData <- the.data [sample(1:727,200),c(4:9)]

Save "selData" to a text file titled "name-StudentID-PCASelData.txt" using the following R code (NOTE you must upload this text file with your submission).

write.table(selData,"name-StudentID-PCASelData.txt")

Conduct a principal component analysis (PCA) on this data (selData). Use the below mentioned "biplot" code (in R) to produce a scatterplot using the first two principal components. Comment on the plot.
pZ <- prcomp(selData, tol = 0.01, scale = TRUE) pZ
summary(pZ) biplot(pZ)

Draw a graph of variance verses the principal components, and explain how this can be used to determine the correct number of principal components.

For the same data above (selData), compute the Euclidean distance matrix. Use the distance matrix to perform a classical multidimensional scaling (classical MDS or Metric MDS). You can use the following command

mds <- cmdscale(selData.dist) # here ‘selData.dist' is the distance matrix

Plot the results and comment on them

For the same data above (selData), perform a non-metric MDS, called ‘isoMDS' in R using number of dimensions k set to 2. Use the following command to do this:

library(MASS)
fit<-isoMDS(selData.dist, k=2)

Plot the results of this isoMDS

Draw the Shepard plot for this isoMDS results and comment on them

For the same data above (selData), perform a non-metric MDS, called ‘isoMDS' in R using the number of dimensions k set to 4.
library(MASS)
fit<-isoMDS(selData.dist, k=4)

Draw the Shepard plot for this isoMDS results and compare the plot obtained for k=2 in Q6.6 above. Comment on them

Q7) Clustering:

K-Means clustering: Use the data file "SITdata2018.txt" provided in CloudDeakin for this question. Load the file "SITdata2018.txt" using the following:
zz<-read.table("SITdata2018.txt") zz<-as.matrix(zz)
a) Draw a scatter plot of the data.

b) State the number of classes/clusters that can be found in the "SITdata2018" (zz).

c) Use the above number of classes as the k value and perform the k-means clustering on that data. Show the results using a scatterplot. Comment on the clusters obtained.

d) Vary the number of clusters (k value) from 2 to 20 in increments of 1 and perform the k-means clustering for the above data. Record the total within sum of squares

(TOTWSS) value for each k, and plot a graph of TOTWSS verses k. Explain how you can use this graph to find the correct number of classes/clusters in the data.

Spectral Clustering: Use the same dataset (zz) and run a spectral clustering (use the number of clusters/centers as 4) on it. Show the results on a scatter plot (with colour coding). Compare these clusters with the clusters obtained using the k-means above and comment on the results.

Attachment:- SITdata.rar

Reference no: EM132092020

Questions Cloud

Mckinley on american expansionism : Does McKinley reveal a clear vision and goal in regard to the Philippines? Why or why not?
Discuss how these rates affected your banks profitability : Prepare a chart and analyze your selected commercial bank's historical profitability with the Federal Reserve interest rates over the past five years.
How did race shape arguments about expansion : How did race shape arguments about expansion and politics during this period?
Opportunity commission alleging weight discrimination : Margaret filed a complaint with the Equal Employment Opportunity Commission alleging weight discrimination.
What is the meaning of conjugate prior : SIT743 Multivariate and Categorical Data Analysis - Find an expression for the logliklihood function - What is the meaning of conjugate prior
How you would respond or react in each case using techniques : Identify each case and provide an explanation of how you would respond or react in each case using communication techniques used to manage conflict.
Why the firm should hire more labor : Why the firm should hire more labor if the marginal product of labor is 75 unit of output and the wage is $15 and output sells for $0.80 per unit?
Customer order start the jit process : Or buy any Apple iPod in any model or color right off the shelf at Best Buy? Doesn't the customer's order start the JIT process?
Tribunal are useful in international dispute resolution : Do you feel that international tribunal are useful in international dispute resolution?

Reviews

len2092020

8/21/2018 10:32:10 PM

For this assignment, you need to submit the following FOUR files. 1. A written document (only pdf) covering all of the items described in the questions. All answers to the questions must be written in this document, i.e, not in the other files (code and data files) that you will be submitting. 2. A separate “.R” file or ‘.txt’ file containing your code (R-code script) that you implemented to produce the results. Name the file as “name-StudentID-Ass1-Code.R" (where `name' is replaced with your name - you can use your surname or first name, and StudentID with your student ID). 3. Two data files named “name-StudentID-BikeShareMyData.txt" and “name-StudentID- PCASelData.txt" (where `name' is replaced with your name - you can use your surname or first name, and StudentID with your student ID).

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd