Devise a book recommendation system

Assignment Help Applied Statistics
Reference no: EM131305434

There are two files uploaded to Blackboard - BX-Books.csv and BX_Book-Ratings.csv. The former contains information about a variety of books, and the latter file contains several hundred thousand book ratings from the Book Crossing Website.

Use R to devise a book recommendation system for the data uploaded to Blackboard. In particular, develop a system that can recommend up to three books for an arbitrary user that can be entered into R after sourcing your code. Develop such a system using both a:

(a) User-based collaborative filtering approach. Use Euclidean, Manhattan, correlational, and cosine similarity distance measures. What problems (if any) do you run into?

(b) Item-based collaborative filtering approach. Use an adjusted cosine similarity approach as discussed in class. How does this approach compare to the user-based approach?

To load the data into R you will need to use the read.csv function. (i.e. read.csv(filename,header=TRUE)). Please type in ?read.csv" to the R console to see the syntax if you would like further info regarding the function's syntax.

Make your programs functions, where the names of users, can be entered into the R prompt.

(c) What are some general problems with both approaches? Conceptually speaking, how can these issues be ameliorated?

Hints:

- There is some flexibility with respect to how you construct the details of your recommendation system beyond your nearest neighbor algorithm. For example, you may use more than one nearest neighbor to make your algorithm better and you can weight the distances appropriately as discussed in class. Please feel free to discuss what your code is doing in a Word document or PDF and submit that along with your assignment. This will make it easier for the grader to understand the logic behind your algorithm.

- Make sure your program ignores zero values for the purposes of computing distances. Otherwise your recommendation system will be influenced by unrated books

- Use an estimated rating of above 5 as a threshold for the recommendation system.

- If your model cannot provide any recommendations for a particular individual, then please have it say so. You can discuss this in (c).

Reference no: EM131305434

Questions Cloud

Discuss the characteristics of a horizontal program : The characteristics of a horizontal program lends itself to targeting maternal and child health. A necessary focus of healthcare for refugee populations in the Democratic Republic of the Congo (DRC) would be prenatal care as DRC has one of the hig..
Equilibrium price and equilibrium quantity of bonds : Explain what will happen to the equilibrium price and equilibrium quantity of bonds in each of the following situations.
Why interest rate on greek government bonds was increasing : Explain what the article means by "uncertainty over Greece's ability to fund itself."- What does it mean to say that Greek bonds were "under increasing pressure"?
What racial distinctions exist in regards to stature explain : Steckel (1995) uses anthropometric analysis to better understand well-being during a period of time before GDP was formally recorded. What racial distinctions exist in regards to stature? Explain
Devise a book recommendation system : DATS 6103: Introduction to Data Mining - devise a book recommendation system for the data uploaded to Blackboard. In particular, develop a system that can recommend up to three books for an arbitrary user that can be entered into R after sourcing y..
Why might deflation be good news to investors who hold bonds : In the article referenced in Solved Problem, Consumer Reports also advised, "Bonds could do well in 2010 if deflation reigns."- What is deflation?- Why might deflation be good news to investors who hold bonds?
Why would longer term bonds be most at risk : Longer-term bonds are most at risk." What effect would an increase in expected inflation have on bond prices? Why would longer-term bonds be most at risk?
Discuss about the hiv-aids-malaria-anemia : Discuss about the causes of maternal mortality include severe bleeding, obstructed labor, infection, hypertensive disorders of pregnancy, HIV/AIDS, malaria, anemia, and unsafe abortion.
What will be the effect on bond prices and interest rates : What will be the effect on bond prices and interest rates?- Who is likely to have gained the most: investors who bought long-term bonds in 2010 or investors who sold them? Briefly explain.

Reviews

Write a Review

Applied Statistics Questions & Answers

  Considering a new method of assembling its golf cart

The management of White Industries is considering a new method of assembling its golf cart. The present method requires 42.3 minutes, on the average, to assemble a cart. The mean assembly time for a random sample of 64 carts, using the new method, wa..

  The mean weight for the auburn university

The mean weight for the Auburn University football team with 40 member is 180lbs. The mean weight for he Alabama team with 45 members is 160lbs and the mean weight for Troy is with 50 members is 170lbs.  The overall mean weight for all three teams is..

  Regression how do i examine correlations of all variables 1

how do i examine correlations of all variables 1 dv amp 5 iv and summarize their relationship? ltbrgti also want to

  What is the probability that a transaction can be completed

Transactions to a computer database are either new items or changes to previous item. The addition of an item can be completed. if 30% of the transactionss are changes what is the probability that a transaction can be completed in less that 100 milli..

  The requirement is to write a report about wave equation

the requirement is to write a report about wave equation confinement method through fourier series and finite element

  1 introductionexplain the purpose of the studyii summary of

1. introductionexplain the purpose of the study.ii. summary of data collectionidentify sample population sampling frame

  An irs study

3. According to an IRS study, it takes a mean time of 290 minutes for taxpayers to prepare, copy and electronically transfer a 1040 tax form. The standard deviation of this distribution is 75 minutes. A consumer watchdog agency selects a random sampl..

  What percentage of hospitals provide at least some charity c

1) What percentage of hospitals provide at least some charity care? Based on a random sample of hospital reports from eastern states, the following information is obtained (units in percentage of hospitals providing at least some charity care):

  Does the data in this sample support the claim

Does the data in this sample support the claim of that the proportion of filled orders from the Tai Pi plant is greater than the proportion of filled order from the Seoul Plant?

  The average wonderlic score and graduation rate

Relationship between the average Wonderlic score and graduation rate?

  What is the mean number of ducks

What is the mean number of ducks that are killed and what is the expected number of hunters who hit the duck they aim at?

  The outcomes of three different treatments for anxiety

Study investigates whether there are differences in the outcomes of three different treatments for anxiety. The treatment conditions that are compared are treatment with medication, treatment with psychotherapy, and placebo (inactive pills). ..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd