Perform association rule mining using apriori algorithm

Assignment Help Other Subject
Reference no: EM132300026

Question 1

An online shopping site has the following primary pages or sections: Home, Products, Search, Prod_A, Prod_B, Prod_C, Cart, Purchase. A user may browse from "Home" to "Products" and then to one of the individual products. The user may also search for a specific product by using the "Search" function. A visit to "Cart" implies that the user has placed an item in the shopping cart, and "Purchase" indicates that the user has completed the purchase of items in the shopping cart. The site has collected some hypothetical session data for 100 sessions. This data is available in Q_sessions file on Moodle. and format.

Use WEKA's K-means clustering algorithm to cluster these user sessions into segments. Try different clustering runs with various numbers of clusters (e.g., between 4 and 8), and select the result set(s) that seem to best answer the following questions.

• If a new user is observed to access the following pages: Home => Search =>Prod_B, according to your clusters, what other product should be recommended to this user? Explain your answer based on your clustering results. What if the new user has accessed the following sequence instead: Products =>Prod_C?

• Can clustering help us identify casual browsers ("window shoppers"), focused browsers (those who seem to know what products they are looking for), and searchers (those using the search function to find items they want)? If so, Are any of these groups show a higher or lower propensity to make a purchase?

• Do any of the segments show particular interest in one or more products, and if so, can we identify any special characteristics about their navigational behavior or their purchase propensity?

• If we know that, during the time of data collection, independent banner ads had been placed on some popular sites pointing to products A and B, can we identify segments corresponding to visitors that respond to the ads? (note that such user's are likely to enter the site by going directly to product pages rather than navigate from the Home page). If so, can we determine if either of these promotional campaigns are having any success?

For this problem, you should submit your clustering result summary (including the cluster centroids), the final data set which shows the final assignment of these sessions to clusters, and your answers to the above questions along with your justification based on the clustering results.

Question 2

For this problem you will use some preprocessed and aggregated clickstream data from a real e-commerce site, and use association rule mining to perform market basket analysis on the visitor session data.

[Note: Please watch the class video Association Rule Mining with WEKA (23 min) demonstrating the use of Apriori algorithm in WEKA for market basket analysis.]

There are two primary types of products sold through the above site, leg care products, and leg ware products. Each category includes various subcategories and individual products from multiple vendors. There is also a separate categorization of products by specialized "Collections" and by "Assortments." The data collection mechanism, in addition to capturing clickstream page-level data, also captures the information on categories, subcategories, assortments, and collections of products accessed in a given session.

For simplicity, the provided data combines and aggregates visited pages from the log files, category and subcategory names, and product related content pages/categories. The aggregate data contains a total of 182 attributes corresponding to pages or categories. These attributes are listed in the file Leg-Pages.txt. The session data is provides in ARFF format in the filelegs.arff . All datasets is provided on Moodle. This data contains a total of 7296 sessions (each row in the data). For the purpose of market basket analysis in WEKA, the session data is represented in relational format with unary categorical attributes (a value of "Y" indicates that the corresponding page/category was visited in the session, while a value of "?" indicates that the page/category is missing from the session). Thus, a typical association rule might look similar to the following:

/Products/Legwear=Y /Products/Legwear/Berkshire=Y ==> Collection: Better Than Bare - Queen=Y
or
Category: Health Supplements=Y ==> Subcategory: Bones & Joints=Y

Your task in this problem is as follows:

1. Load the data into WEKA and review the distributions in the data (go down the list of attributes and make a note of which pages or products are most frequent and which are least frequent - you can list the top 3 and the bottom 3 in your submission.

2. Perform association rule mining using Apriori algorithm with a "lowerBoundMinSupport" of 0.05 and using Lift of 2.5 as the minMetric for filtering the rules. Also, set "outputItemsets" to "True" so that you can also view the frequent items sets of different sizes in addition to the rules. Write a short summary of your observations, including any significant or interesting (e.g., unobvious or unexpected) associations you observe in the data based on the results. Save your result set and submit along with your assignment submission

3. Next, run the Apriori algorithm with a lower "lowerBoundMinSupport" so that you can identify associations at a more granular level (e.g., the level of individual products brands rather than higher level categories). You might want to start with 0.025 and go lower if necessary. Experiment with this threshold, as well as the Lift or Confidence metrics in different runs and pick the result set that seems to provide the most useful information (e.g., not too many obvious or noisy rules and not too few general rules). Again, provide a short summary of your observations, including some examples of rules or associations that you find interesting or useful.

Attachment:- Assignment.7z

Reference no: EM132300026

Questions Cloud

Evaluate the importance of a cross-functional team : Evaluate the importance of being a member of a cross-functional team to a future leader. Provide a rationale for your response.
Establish or sustain a competitive advantage : Per the textbook, technology is a key driver of change and an important source of competitive advantage in business environments. Examine the overall manner.
Evaluate the importance of innovation : Evaluate the importance of innovation for the long term survival for your chosen company as well as the industry that your chosen company fits in.
Define approach for investigating the proposed research need : Proposed Research Methodology: Delineate a methodologically and theoretically sound approach for investigating the proposed research need/topic.
Perform association rule mining using apriori algorithm : ECT 584 - Web Data Mining - DePaul University - Review the distributions in the data go down the list of attributes and make a note of which pages or products
What did willy believe was the key to successful selling : What did Willy believe was the key to successful selling? Do you agree? Although Willy was a salesman, do you believe Arthur Miller was writing mostly about.
Briefly introduce and summarize the article : For this assignment, you will utilize the CSU Online Library to find a peer-reviewed article regarding health disparities and/or the social determinants.
Provide a concise summary of Roseanne situation : CNA152 Health Assessment - Clinical Reasoning Report, University of Tasmania, Australia, UTAS. Provide a concise summary of Roseanne's situation
Develop strategic objectives for balanced scorecard areas : Develop three strategic objectives for each of the four balanced scorecard areas using the Balanced Scorecard Template. Please provide a title page.

Reviews

len2300026

5/6/2019 2:24:23 AM

Marking Criteria: Marks will be allocated for correct execution of steps and correct justification of answers. You need to compile all required solutions in a word file. Provide necessary snapshots with your answer justifications. Make sure that the answers are numbered properly and do no rewrite the questions.

Write a Review

Other Subject Questions & Answers

  Discuss about high blood pressure and recurrent dvts

He denies weakness of the lower extremities, denies bowel or bladder changes or dysfunction, and denies radiation of pain to the lower extremities.

  What can the nurse communicate to isaac father

What can the nurse communicate to Isaac's father about his physical growth? What can the nurse teach Isaac's father about his moral development

  What are some common methods that terrorists use to raise

What are some common methods that terrorists use to raise funds? Why is it difficult for authorities to track and stop terrorist financing?

  Despite the power and resources of the bourgeoisie

Despite the power and resources of the bourgeoisie,

  Enduring dilemmas of indigenous health

The words should be around 250-260. The subject is related to Health science. Use simple words and sentences. Enduring dilemmas of Indigenous health

  Major developments in supply development

Major Developments in supply development -  E-procurement frees supply managers to focus on value-added activities. Which is not an example of such a value-added activity?

  Describe the style of parenting and how you felt now

Describe the style of parenting and how you felt now and then about the experience. Describe the type of limits, consequences, punishments, encouragement, democratic guidance (communication and discussions), and/or rewards

  Define business writing in preparation for employment

Business Writing in Preparation for Employment in a Technical Field, Prepare the cover letter,resume,and envelope for mailing

  What specific advantage to your company in its use of csr

Select some aspect of CSR, and discuss how your organization employs CSR in its operations. What are the specific advantages to your company in its use of CSR?

  Define what are the universal characteristics of marriages

the characteristics that a relationship must have in order to be a marriage. In other words, what are the universal characteristics of marriages

  Explain components of socialcognitive theory which describe

write a 1050- to 1400-word paper analyzing the formation of habits using behavioral and socialcognitive approaches.

  Advise ineos on the strategic options they should take

Advise Ineos on the strategic options they should take, the barriers they may encounter in implementing these strategic options

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd