Calculate the fitted values for all three models

Assignment Help Other Subject
Reference no: EM132363421

R Programming Assignment -

Task 1 -

Let's see the effect of altering the values of our weights $\alpha$.

We'll use this following toy data, where x is the values 1 through 4, and t is the values, 1, 3, 2, 4.

'''{r}

x <- c(1,2,3,4)

t <- c(1,3,2,4)

'''

We begin by setting our vector of weights all equal. I have already fit a model using 'lm()' with the argument 'weights'. Use matrix operation to find and print the parameter estimates. They should equal the parameter estimates found via 'lm()'

'''{r}

a1 <- c(1,1,1,1)

model1 <- lm(t ~ x, weights = a1)

plot(x,t,xlim = c(0,5), ylim = c(0,5), asp=1)

abline(model1)

print(model1$coefficients)

Your code to make the matrices and find the parameter estimates.

Now we alter the weights. This vector puts large weight on the two inner points (x=2, x= 3), and small weight on the outer points (x=1, x=4). Again, use the matrix operations to find and print the parameter estimates using the provided weights. Compare them against the estimates found via 'lm()'. I have plotted the fitted line, comment on the effect of the weights.

''{r}

a2 <- c(0.1, 5, 5, 0.1) #

model2 <- lm(t ~ x, weights = a2)

plot(x,t,xlim = c(0,5), ylim = c(0,5), asp=1)

abline(model2)

print(model2$coefficients)

We alter the weights again. This time large weight are on the two outer points (x=1, x=4), and small weight on the inner points (x=2, x= 3). Again, use the matrix operations to find and print the parameter estimates using the provided weights. Compare them against the estimates found via 'lm()'. Look at the fitted line and comment on the effect of the weights.

'''{r}

a3 <- c(5, 0.1, 0.1, 5) #

model3 <- lm(t ~ x, weights = a3)

plot(x,t,xlim = c(0,5), ylim = c(0,5), asp=1)

abline(model3)

print(model3$coefficients)

'''

Try to explain Weighted Least Squares regression. What effect would putting a very large weight, say 1000, on a point have on the regression line? What effect would putting a weight of 0 on a point have on the regression line?

Task 2 - OLS Matrix Notation

Look at the Chirot dataset which covers the 1907 Romanian Peasant Revolt.

The data covers 32 counties in Romania, and the response variable is the intensity of the rebellion.

The orginal paper by Daniel Chirot, which provides details of the analysis and variables.

A basic data dictionary can be found with 'help(Chirot)'

'''{r}

library(carData)

data(Chirot)

chirot_mat <- as.matrix(Chirot)

'''

We'll do an analysis with matrix operations. We start by extracting the commerce column and creating our $\mathbf{X}$ matrix.

'''{r}

t <- chirot_mat[, 1, drop = FALSE] # response, keep as a matrix

x <- chirot_mat[, 2, drop = FALSE] # commerce column, keep as matrix

X <- cbind(1, x)

colnames(X) <- c('1','commerce')

head(X)

'''

Use 'lm()' to fit the rebellion intensity to matrix X (which has columns for a constant and commercialization). Make sure you only calculate the coefficient for the intercept once.

Using only matrix operations, calculate and show the coefficient estimates. Verify that they match the estimates from 'lm()'.

Create another matrix (call it X_ct) with three columns: a constant, variable commerce, variable tradition. Print the head of this matrix.

Using matrix operations, calculate and show the coefficient estimates of the model with the variables commerce and tradition.

Create another matrix (call it X_all) with all of the x variables (plus a constant). Print the head of this matrix. Using matrix operations, calculate and show the coefficient estimates of the model with all the variables.

Using matrix operations, calculate the fitted values for all three models. (No need to print out the fitted values.) Create plots of fitted value vs actual value for each model (there will be three plots). Be sure to provide a title for each plot.

Now that you have calculated the columns of fitted values, find the Residual Sum of Squares and the R-squared values of the three models. Which model has the smallest RSS? (RSS is the sum of (actual - fitted)^2. R-sq is the correlation between actual and fitted, squared.)

Task 3 - Cross-Validation

We can use cross-validation to evaluate the predictive performance of several competing models.

I will have you manually implement leave-one-out cross-validation from scratch first, and then use the built-in function in R.

We will use the dataset 'ironslag' from the package 'DAAG' (a companion library for the textbook Data Analysis and Graphics in R).

The description of the data is as follows: The iron content of crushed blast-furnace slag can be determined by a chemical test at a laboratory or estimated by a cheaper, quicker magnetic test. These data were collected to investigate the extent to which the results of a chemical test of iron content can be predicted from a magnetic test of iron content, and the nature of the relationship between these quantities. [Hand, D.J., Daly, F., et al. (1993) __A Handbook of Small Data Sets__]

The 'ironslag' data has 53 observations, each with two values - the measurement using the chemical test and the measurement from the magnetic test.

We can start by fitting a linear regression model $y_n = w_0 + w_1 x_n + \epsilon_n$. A quick look at the scatterplot seems to indicate that the data may not be linear.

'''{r linear_model}

# install.packages("DAAG") # if necessary

library(DAAG)

x <- seq(10,40, .1) # a sequence used to plot lines

L1 <- lm(magnetic ~ chemical, data = ironslag)

plot(ironslag$chemical, ironslag$magnetic, main = "Linear fit", pch = 16)

yhat1 <- L1$coef[1] + L1$coef[2] * x

lines(x, yhat1, lwd = 2, col = "blue")

'''

In addition to the linear model, fit the following models that predict the magnetic measurement (Y) from the chemical measurement (X).

Task 3A - Your job is to create the plots with fitted lines.

When plotting the fitted line for this one, create estimates of log(y-hat) linearly.

Task 3B: Leave-one-out Cross validation

You will now code leave-one-out cross validation. In LOOCV, we remove one data point from our data set. We fit the model to the remaining 52 data points. With this model, we make a prediction for the left-out point. We then compare that prediction to the actual value to calculate the squared error. Once we find the squared error for all 53 points, we can take the mean to get a cross-validation error estimate of that model.

To test out our four models, we will build a loop that will remove one point of data at a time. Thus, we will make a 'for(i in 1:53)' loop. For each iteration of the loop, we will fit the four models on the remaining 52 data points, and make a prediction for the remaining point.

Write a line to select the ith line in the data

Compare the sizes of the cross validation error to help you decide which model does the best job of predicting the test cases.

Task 3C: Cross-validation with R

Now that you have written your cross-validation script from scratch, we can use the built-in functions in R. Library(boot) has the function 'cv.glm()' which can be used to estimate cross-validation error.

To make use of 'cv.glm()' on the linear models, we must first use 'glm()' to fit a generalized linear model to our data. If you do not change the attribute "family" in the function 'glm()', it will fit a linear model

'''{r cvwithR, error = TRUE}

library(boot)

gL1 <- glm(magnetic ~ chemical, data = ironslag) # equivalent to lm(magnetic ~ chemical)

Find the LOOCV CV values for all of the models.

Your LOOCV estimates from 'cv.glm()' should match your estimates when you coded your algorithm from scratch.

Based on your Cross Validation scores, which model seems to be the best?

Task 3D: Revisit Chirot

Use 'glm()' to fit the three models to the Chirot data.

model1: intensity ~ commerce

model2: intensity ~ commerce + tradition

model3: intensity ~ all other variables

Use 'cv.glm()' to calculate the LOOCV scores to compare models. Which model would you select?

Task 4: Using R's 'optim()' function

The 'optim()' function in R is quite versatile and can be used to minimize a wide range of functions using a variety of methods. The default method, the Nelder-Mead simplex algorithm, is a numeric method that works quite well for multidimensional functions. Other methods, like BFGS which uses a modified version of gradient descent, are included and are also quite powerful.

We will do an exercises to use 'optim()' in to minimize the loss function in linear regression.

We'll use a synthetic data set.

Use 'optim()' to minimize the sum of squared residuals in linear regression

 

For the first task, we define the loss function as the sum of squared residuals.

Write your loss function as a function of the vector 'par' which will have a lenght of 2. The vector par will contain the current estimates of the intercept and slope. Treat the data ('x' and 't') as fixed quantities that are available in the global environment.

The function will return a single value: the sum of squared residuals.

The following chunk runs optim and returns the results. The arguments it takes in is the initial values of 'par', and the function that it is trying to minimize. I also specify the method 'BFGS' which uses a version of gradient descent as the numeric algorithm to optimize the function.

'''{r, error = TRUE}

results <- optim(par = c(0,1), fn = loss, method = 'BFGS')

results$par

'''

When we look at the results and the values of the parameters that minimize the loss function, they should match the coefficients estimated by 'lm()'.

Attachment:- R Programming Assignment File.rar

Reference no: EM132363421

Questions Cloud

Describe role spiritually or faith plays in your thinking : Describe the role spiritually or faith/religion plays in your thinking and --provide examples-- to support your thesis and describe the ethical implications of.
Develop a human resources organizational plan : Assume that you work in human resources management and have been tasked with developing a strategic plan to ensure employees are efficiently providing quality.
Why some disasters receive fund while others do not : Instructions: Research topic is the following: Why some disasters receive fund while others don't? Need the analysis covering 9 different donors
What factors influence the turnover rate of employees : Assessment item - Research Proposal Presentation. What are the factors that influence the turnover rate of employees in the hospitality industry
Calculate the fitted values for all three models : R Programming Assignment - Using matrix operations, calculate the fitted values for all three models. (No need to print out the fitted values.)
Calculate the volume of the parallelepiped spanned : A block is dragged across a floor with a force of 200N that acts at an angle of 90 degrees to the direction of displacement), over a distance of 3m.
Calculate an ICT carbon footprint for a department : ICTSUS601 Integrate Sustainability in ICT Planning and Design Projects Assignment, Abbey College Australia. Calculate an ICT carbon footprint for a department
Identify the factor that generates the periodic nature : Find the maximum and minimum voltages and the times at which they occur. Identify the factor that generates the periodic nature of the graph.
How would you use normal vectors to identify configuration : Solve one questions about intersection of the plane and then you will make a model of intersection of planes. How would you use normal vectors to identify.

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd