Reference no: EM132537106
Assignment: 1. Consider the data set shown in screenshot 1.
(a) Compute the support for item sets {e}, {b, d}, and {b, d, e} by treating each transaction ID as a market basket.
(b) Use the results in part (a) to compute the confidence for the association rules {b, d} -→ {e} and {e} -→ {b, d}. Is confidence a symmetric measure?
(c) Use the results in part (c) to compute the confidence for the association rules {b, d} -→ {e} and {e} -→ {b, d}.
2. Consider the transactions shown in Screenshot 2, with an item taxonomy given in screenshot 3
(a) What are the main challenges of mining association rules with item taxonomy?
(b) Consider the approach where each transaction t is replaced by an extended transaction t_ that contains all the items in t as well as their respective ancestors. For example, the transaction t = { Chips, Cookies} will be replaced by t_ = {Chips, Cookies, Snack Food, Food}. Use this approach to derive all frequent item sets (up to size 4) with support ≥ 70%.
(c) Consider an alternative approach where the frequent item sets are generated one level at a time. Initially, all the frequent item sets involving items at the highest level of the hierarchy are generated. Next, we use the frequent item sets discovered at the higher level of the hierarchy to generate candidate item sets involving items at the lower levels of the hierarchy. For example, we generate the candidate item set {Chips, Diet Soda} only if {Snack Food, Soda} is frequent. Use this approach to derive all frequent item sets (up to size 4) with support ≥ 70%.
3. Consider a data set consisting of 220 data vectors, where each vector has 32 components and each component is a 4-byte value. Suppose that vector quantization is used for compression and that 216 prototype vectors are used. How many bytes of storage does that data set take before and after compression and what is the compression ratio?
Several big data visualization tools
: Several Big Data Visualization tools have been evaluated in this week's paper. While the focus was primarily on R and Python with GUI tools,
|
Explain between upstream and downstream costs
: Explain the differences between manufacturing overhead, upstream and downstream costs, and the indirect costs of responsibility centres.
|
How effective is ppe in preventing the spread
: What risk group / biosafety category and reservoir would COVID-19 be in?
|
About cellular respiration or photosynthesis
: Write a creative story about cellular respiration or photosynthesis. Your story will be a creative narrative that treats the processes as an analogy.
|
Compute the support for item sets
: Compute the support for item sets {e}, {b, d}, and {b, d, e} by treating each transaction ID as a market basket. Use the results in part (a) to compute.
|
Discuss the advantages and disadvantages of abc
: Discuss the advantages and disadvantages of ABC. Again be specific and if possible relate this information to the firm that you researched.
|
How does lamarck evolution explain species change
: How does Lamarck's evolution explain species change? And Darwin's evolution? What does time and the Bible have to do with both models?
|
What the mark up percentage based on total variable cost
: If the target profit is $2 per unit, and using cost plus pricing approach, What the mark up percentage based on total variable costs is
|
Implementation of international strategy
: How will the organisational capabilities translate to core competencies that give Cisco a sustainable competitive advantage in their industry?
|