Reference no: EM132291853
Assignment -
Part 1 -
Overview - The Institute for Statistics Education at Statistics asks students to rate a variety of aspects of a course as soon as the student completes it. The Institute is contemplating instituting a recommendation system that would provide students with recommendations for additional courses as soon as they submit their rating for a completed course. Consider the excerpt from student ratings of online statistics courses shown in Table 1 below, and the problem of what to recommend to student E.N.
Table 1 - Ratings of online statistics courses: 4 = Best, 1 = worst, blank = not taken
|
SQL
|
Spatial
|
PA 1
|
DM in R
|
Python
|
Forecast
|
R Prog
|
Hadoop
|
Regression
|
L N
|
4
|
|
|
|
3
|
2
|
4
|
|
2
|
M H
|
3
|
|
|
|
4
|
|
|
|
|
J H
|
2
|
|
|
|
|
|
|
|
|
E N
|
4
|
|
|
4
|
|
|
4
|
|
3
|
D U
|
4
|
4
|
|
|
|
|
|
|
|
F L
|
|
4
|
|
|
|
|
|
|
|
G L
|
|
4
|
|
|
|
|
|
|
|
A H
|
|
3
|
|
|
|
|
|
|
|
S A
|
|
|
4
|
|
|
|
|
|
|
R W
|
|
|
2
|
|
|
|
|
4
|
|
B A
|
|
|
4
|
|
|
|
|
|
|
M G
|
|
|
4
|
|
|
4
|
|
|
|
A F
|
|
|
4
|
|
|
|
|
|
|
K G
|
|
|
3
|
|
|
|
|
|
|
D S
|
4
|
|
|
2
|
|
|
4
|
|
|
In R Your Job is To:
- Consider a user-based collaborative filter. This requires computing correlations between all student pairs. For which students is it possible to compute correlations with E.N.? Compute them.
Then, tell me:
- Which single course should we recommend to E.N. based on the single nearest student to E.N.? Explain why.
- Based on the cosine similarities of the nearest students to E.N., which course should be recommended to E.N.?
- What is the conceptual difference between using the correlation as opposed to cosine similarities? [Hint: how are the missing values in the matrix handled in each case?]
Then:
With large datasets, it is computationally difficult to compute user-based recommendations in real time, and an item-based approach is used instead. Returning to the rating data (not the binary matrix), let's now take that approach.
- If the goal is still to find a recommendation for E.N., for which course pairs is it possible and useful to calculate correlations?
- Just looking at the data, and without yet calculating course pair correlations, which course would you recommend to E.N., relying on item-based filtering? Calculate two course pair correlations involving your guess and report the results.
Finally:
- Apply item-based collaborative filtering to this dataset (using R) and based on the results, recommend a course to E.N.
Part 2 -
Overview - The dataset below ToyotaCorolla.csv contains information with 1436 records and details on 38 attributes, including Price, Age, KM, HP, and other specifications. The goal is to predict the price of a used Toyota Corolla based on its specifications.
In R Your Job is To:
- Fit a neural network model to the data. Use a single hidden layer with 2 nodes.
- Use predictors Age_08_04, KM, Fuel_Type, HP, Automatic, Doors, Quarterly_Tax, Mfr_Guarantee, Guarantee_Period, Airco, Automatic_airco, CD_Player, Powered_Windows, Sport_Model, and Tow_Bar.
- Remember to first scale the numerical predictor and outcome variables to a 0-1 scale (use function preprocess() with method = "range"-see Chapter 7) and convert categorical predictors to dummies.
- Record the RMS error for the training data and the validation data. Repeat the process, changing the number of hidden layers and nodes to {single layer with 5 nodes}, {two layers, 5 nodes in each layer}.
Finally, answer the following prompts:
- What happens to the RMS error for the training data as the number of layers and nodes increases?
- What happens to the RMS error for the validation data?
- Comment on the appropriate number of layers and nodes for this application.
Note - Immediate turn around file and csv file attached.
Attachment:- Assignment Files.rar