Perform association rule mining using apriori algorithm

Assignment Help Other Subject
Reference no: EM132300026

Question 1

An online shopping site has the following primary pages or sections: Home, Products, Search, Prod_A, Prod_B, Prod_C, Cart, Purchase. A user may browse from "Home" to "Products" and then to one of the individual products. The user may also search for a specific product by using the "Search" function. A visit to "Cart" implies that the user has placed an item in the shopping cart, and "Purchase" indicates that the user has completed the purchase of items in the shopping cart. The site has collected some hypothetical session data for 100 sessions. This data is available in Q_sessions file on Moodle. and format.

Use WEKA's K-means clustering algorithm to cluster these user sessions into segments. Try different clustering runs with various numbers of clusters (e.g., between 4 and 8), and select the result set(s) that seem to best answer the following questions.

• If a new user is observed to access the following pages: Home => Search =>Prod_B, according to your clusters, what other product should be recommended to this user? Explain your answer based on your clustering results. What if the new user has accessed the following sequence instead: Products =>Prod_C?

• Can clustering help us identify casual browsers ("window shoppers"), focused browsers (those who seem to know what products they are looking for), and searchers (those using the search function to find items they want)? If so, Are any of these groups show a higher or lower propensity to make a purchase?

• Do any of the segments show particular interest in one or more products, and if so, can we identify any special characteristics about their navigational behavior or their purchase propensity?

• If we know that, during the time of data collection, independent banner ads had been placed on some popular sites pointing to products A and B, can we identify segments corresponding to visitors that respond to the ads? (note that such user's are likely to enter the site by going directly to product pages rather than navigate from the Home page). If so, can we determine if either of these promotional campaigns are having any success?

For this problem, you should submit your clustering result summary (including the cluster centroids), the final data set which shows the final assignment of these sessions to clusters, and your answers to the above questions along with your justification based on the clustering results.

Question 2

For this problem you will use some preprocessed and aggregated clickstream data from a real e-commerce site, and use association rule mining to perform market basket analysis on the visitor session data.

[Note: Please watch the class video Association Rule Mining with WEKA (23 min) demonstrating the use of Apriori algorithm in WEKA for market basket analysis.]

There are two primary types of products sold through the above site, leg care products, and leg ware products. Each category includes various subcategories and individual products from multiple vendors. There is also a separate categorization of products by specialized "Collections" and by "Assortments." The data collection mechanism, in addition to capturing clickstream page-level data, also captures the information on categories, subcategories, assortments, and collections of products accessed in a given session.

For simplicity, the provided data combines and aggregates visited pages from the log files, category and subcategory names, and product related content pages/categories. The aggregate data contains a total of 182 attributes corresponding to pages or categories. These attributes are listed in the file Leg-Pages.txt. The session data is provides in ARFF format in the filelegs.arff . All datasets is provided on Moodle. This data contains a total of 7296 sessions (each row in the data). For the purpose of market basket analysis in WEKA, the session data is represented in relational format with unary categorical attributes (a value of "Y" indicates that the corresponding page/category was visited in the session, while a value of "?" indicates that the page/category is missing from the session). Thus, a typical association rule might look similar to the following:

/Products/Legwear=Y /Products/Legwear/Berkshire=Y ==> Collection: Better Than Bare - Queen=Y
or
Category: Health Supplements=Y ==> Subcategory: Bones & Joints=Y

Your task in this problem is as follows:

1. Load the data into WEKA and review the distributions in the data (go down the list of attributes and make a note of which pages or products are most frequent and which are least frequent - you can list the top 3 and the bottom 3 in your submission.

2. Perform association rule mining using Apriori algorithm with a "lowerBoundMinSupport" of 0.05 and using Lift of 2.5 as the minMetric for filtering the rules. Also, set "outputItemsets" to "True" so that you can also view the frequent items sets of different sizes in addition to the rules. Write a short summary of your observations, including any significant or interesting (e.g., unobvious or unexpected) associations you observe in the data based on the results. Save your result set and submit along with your assignment submission

3. Next, run the Apriori algorithm with a lower "lowerBoundMinSupport" so that you can identify associations at a more granular level (e.g., the level of individual products brands rather than higher level categories). You might want to start with 0.025 and go lower if necessary. Experiment with this threshold, as well as the Lift or Confidence metrics in different runs and pick the result set that seems to provide the most useful information (e.g., not too many obvious or noisy rules and not too few general rules). Again, provide a short summary of your observations, including some examples of rules or associations that you find interesting or useful.

Attachment:- Assignment.7z

Reference no: EM132300026

Questions Cloud

Evaluate the importance of a cross-functional team : Evaluate the importance of being a member of a cross-functional team to a future leader. Provide a rationale for your response.
Establish or sustain a competitive advantage : Per the textbook, technology is a key driver of change and an important source of competitive advantage in business environments. Examine the overall manner.
Evaluate the importance of innovation : Evaluate the importance of innovation for the long term survival for your chosen company as well as the industry that your chosen company fits in.
Define approach for investigating the proposed research need : Proposed Research Methodology: Delineate a methodologically and theoretically sound approach for investigating the proposed research need/topic.
Perform association rule mining using apriori algorithm : ECT 584 - Web Data Mining - DePaul University - Review the distributions in the data go down the list of attributes and make a note of which pages or products
What did willy believe was the key to successful selling : What did Willy believe was the key to successful selling? Do you agree? Although Willy was a salesman, do you believe Arthur Miller was writing mostly about.
Briefly introduce and summarize the article : For this assignment, you will utilize the CSU Online Library to find a peer-reviewed article regarding health disparities and/or the social determinants.
Provide a concise summary of Roseanne situation : CNA152 Health Assessment - Clinical Reasoning Report, University of Tasmania, Australia, UTAS. Provide a concise summary of Roseanne's situation
Develop strategic objectives for balanced scorecard areas : Develop three strategic objectives for each of the four balanced scorecard areas using the Balanced Scorecard Template. Please provide a title page.

Reviews

len2300026

5/6/2019 2:24:23 AM

Marking Criteria: Marks will be allocated for correct execution of steps and correct justification of answers. You need to compile all required solutions in a word file. Provide necessary snapshots with your answer justifications. Make sure that the answers are numbered properly and do no rewrite the questions.

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd