Reference no: EM133749147
Introduction to Data
Preparing report: Either
Write your report in R-markdown, then knit it into a pdf (or if LaTeX is not available on your computer, knit into Word document and convert to PDF)
Or
Prepare your report as a Word-processed document and save as a PDF document.
Dataset Description:
StudyHours: Number of hours a student spends studying per week.
AttendanceRate: Percentage of classes attended by the student.
HomeworkScore: Average score on homework assignments.
PreviousGPA: GPA from previous semesters or courses.
ExtracurricularActivity: Indicates whether the student participates in extracurricular activities.
Values: 0 (No), 1 (Yes).
FinalGrade: The final grade or score received by the student in the course.
Question 1
Import the dataset and explore it (Identify the types of variables, number of observations and view first few rows).
Create a box plot to visualize the relationship between FinalGrade and ExtracurricularActivity and interpret it.
Create a scatter plot to visualize the relationship between FinalGrade and StudyHours and interpret it.
Construct the matrix plot and correlation matrix. Comment on the relationship among variables. (Hint: exclude any categorical variables, if present.)
Question 2
Fit a simple linear regression model to predict FinalGrade in terms of StudyHours and give the model summary.
Test the significance of the slope parameter (Write down the relevant hypothesis).
Interpret the slope parameter. (Note: no need to interpret the intercept)
Discuss the accuracy of the parameter estimates.
Discuss the overall accuracy of the model.
Check for the model assumptions.
Write down the model equation.
Predict the FinalGrade of a student who studied 40 hours using the model in part g).
Question 3
Fit a multiple linear regression model to predict FinalGrade in terms of all the other variables in the dataset and give the model summary.
Remove insignificant variables (if there is any) and fit a model including the rest of the variables.
Add the interaction term between StudyHours and PreviousGPA to the model above (part b) and give the model summary.
Comment on the significance of the parameters of the model above (part c).
Compare and comment on the accuracy of the models in part b) and part c).
Fit a polynomial regression model to FinalGrade using StudyHours of order 2 and test the model significance. Give the resulting model.