What data mining functions does this business need

Assignment Help Other Subject
Reference no: EM133115565

Data Mining for Business Analytics and Cyber Security

Task overview:

Exercise One

Question 1:
Present an example where data mining is crucial to the success of a business. What data mining functions does this business need? Can they be performed alternatively by data query processing or simple statistical analysis?

Question 2 :
Suppose your task as a software engineer at a University is to design a data mining system to examine their university course database, which contains the following information: the name, address, and status (e.g., undergraduate or postgraduate) of each student, the course taken, and their cumulative grade point average (GPA). Describe the architecture you would choose.

Question 3:
Describe a standard form of data to be acceptable in predictive data mining techniques.

Question 4:
Which type the following variables can be classified to:
a) National ratings of computer science departments.
b) Pulse rate in beats/minute.
c) Adult Status
d) Age in years.
e) Class - Freshman, Sophomore, Junior, Senior, Grad Student.
f) Colors

Exercise 2

1. What is the difference between classification and regression? How are they similar?

2. What is difference between supervised and unsupervised Learning with examples?

3. The following table contains training examples that help predict whether a patient is likely to have a heart attack.

2270_Data Mining for Business Analytics.jpg

 

1. Using the heuristic of "selecting the attribute based on that it will best separate the samples into individual classes" to construct a minimal decision tree that predicts whether or not a patient is likely to have a heart attack. Show each step.

2. Do you need all the attributes for constructing this minimal decision tree?

3. Translate your decision tree into a collection of decision rules.

Exercise 3
1. What is a Perceptron and what is Multilayer perceptron? Illustrate the structure of aperceptron and multilayer perceptron.

2. Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8):

a) Compute the Euclidean distance between the two objects.
b) Compute the Manhattan distance between the two objects.

3. In the following dataset, 4 subjects belong to two different classes (A and B). Classify the new subject (Subject: Feature 1= 3; Feature 2=7; Class=?) using k nearest neighbour classification. Using Euclidean distance as distance function and the object is assigned to the majority class within the K nearest neighbour.

Perform kNN classification for the following values of k:

(a). k = 1
(b). k = 3

251_Data Mining for Business Analytics1.jpg

 

New Subject: Feature 1= 3; Feature 2=7; Class=?

Exercise 4

1. Compare the advantages and disadvantages of (a) K-means and (b) K-medoids for clustering.

Discuss a main challenge common to both the K-means and K-medoids algorithms.

2. Explain inter-cluster and intra-cluster distances and their relationship when used to evaluate clustering results?

3. Suppose that the data mining task is to cluster points (with (x, y) representing location) into three clusters, where the points are:

A1(2, 10), A2(2, 5), A3(8, 4), B1(5, 8), B2(7, 5), B3(6, 4), C1(1, 2), C2(4, 9).

The distance function is Euclidean distance. Suppose initially we assign A1, B1, and C1 as the center of each cluster, respectively. Use the k-means algorithm to show only

(a) the three cluster centers after the first round of execution.
(b) the final three clusters.

Exercise 5

Question 1.
a) Calculate the confidence of rules A → BCD, and ABC → D given their support?
b) Given a frequent itemset (ABCD), generate all the association rules with three items on LHS (Body) and one item on RHS (Head)?
Please note: Rule form: X (LHS or Body) → Y (RHS or Head)
LHS stands for Left hand side and RHS stands for Right hand side in the rule.

Question 2.
Consider an example of a supermarket database which might have several thousand items of which 1000 items are frequent and several million transactions. Which part of the Apriori algorithm will be most expensive to compute? Why?

Question 3.
Consider the market basket transactions shown in the following table. Assume that min_support = 40% and min_confidence = 70%. Further, assume that the Apriori algorithm is used to discover strong association rules among transaction items.

705_Data Mining for Business Analytics2.jpg

 

a) Show step by step the generated Candidate itemsets (Ck) and the qualified frequent itemsets (Lk) until the largest frequent itemsets are generated.

b) Generate all possible association rules from the frequent itemsets obtained in the previous question. Calculate the confidence of each rule and identify all the strong association rules.

Reference no: EM133115565

Questions Cloud

Discuss how company internal environment : Discuss how a company's internal environment might affect the development of the corporate strategy.
Determine the activity durations : Reflect on a project that you were involved in or one of which you have knowledge. What techniques were used to determine the activity durations?
Risk-free asset of an optimal portfolio : A friend developed a portfolio that earns an expected return of 22%22% and has a return standard deviation of 36%36%. You say that you can obtain the same expec
What are the free cash flows for year 44 : Assume the following investment opportunity. The company has constant leverage (debt to assets) of 50%50%, the cost of equity is 20%20%, and the cost of debt is
What data mining functions does this business need : What data mining functions does this business need? Can they be performed alternatively by data query processing or simple statistical analysis
How much is XYZ Company net income as at December : XYZ Company has the following accounts and balances as at December 31, 2021: How much is XYZ Company's net income as at December 31, 2021
Understanding on the benefits of issuing sovereign sukuk : Discuss three (3) points to describe your understanding on the benefits of issuing sovereign sukuk?
Evaluating factual statements versus opinion-based statement : What factors have courts considered when evaluating factual statements versus opinion-based statements?
How can separation of duties enhance systems security : The desktops used for receiving the transmissions are password protected. How can separation of duties enhance systems security

Reviews

Write a Review

Other Subject Questions & Answers

  What are the benefits of guiding students to monitor

Provide an example of an alternative assessment. Why is it beneficial to use alternative assessments?

  Search of school lockers

How has case law changed with regard to school searches or other constitutional rights on school property, since the T.L.O. case?

  Provide general details about your community

provide general details about your community (location, housing types to be constructed, infrastructure, etc.). You may utilize GIS to display maps if desired.

  Develop your vision of the ideal health care system

Develop your vision of the ideal health care system. Think about how you would go about implementing your ideal system. Consider some of the problems you are.

  How artifacts you selected evince your communication skill

Choose 2-4 artifacts from your work archive that provide evidence of these capacities. Write a 300-500 word framing essay that introduces the reader to the artifacts and explains their significance (and your choices). You should be as specific a..

  Are table games and slot players rated the same

Are table games and slot players rated the same? Why or why not? Is it possible to rate two players on the same account? Why or why not? What is advantage play?

  What can a manager do to influence organisational culture

What impact does organisational culture have on individual/ team performance and why and What is their impact on the organisation

  Discuss in what type of market situations

Discuss in what type of market situations (size, growth, trend, area, etc.) might each type of managed care plan (HMO, PPO, POS, etc.) be the preferred model? Why? Also, examine at least two managed care plans from real life and their market situatio..

  Describe the different generations of cell phones

Please discuss the following topics and provide substantive comments to at least two other posts. Select from the following list one topic and discuss.

  Explain the importance of the legislation as it relates to

using the textbook the argosy university online library resources and the internet do the following1. identify the

  What is meant by the term health care costs

What do the statistics tell us of who are homeless and what are their barriers to healthcare? What is meant by the term "health care costs"? Describe the three different meanings of the term 'cost

  Course of the history of the world

How many wars have been recorded in the course of the history of the world, which were the worst?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd