Data mining

Assignment Help Basic Computer Science
Reference no: EM132518699

Question 1.

Suppose that you are employed as a data mining consultant for an Internet search engine company. Describe how data mining can help the company by giving specific examples of how techniques, such as clustering, classification, association rule mining, and anomaly detection can be applied.

Question 2.

Identify at least two advantages and two disadvantages of using color to visually represent information.

Question 3.

Consider the XOR problem where there are four training points: (1, 1, -),(1, 0, +),(0, 1, +),(0, 0, -). Transform the data into the following feature space:

Φ = (1, √ 2x1, √ 2x2, √ 2x1x2, x2 1, x2 2).

Find the maximum margin linear decision boundary in the transformed space.

Question 4.

Consider the following set of candidate 3-itemsets: {1, 2, 3}, {1, 2, 6}, {1, 3, 4}, {2, 3, 4}, {2, 4, 5}, {3, 4, 6}, {4, 5, 6}

Construct a hash tree for the above candidate 3-itemsets. Assume the tree uses a hash function where all odd-numbered items are hashed to the left child of a node, while the even-numbered items are hashed to the right child. A candidate k-itemset is inserted into the tree by hashing on each successive item in the candidate and then following the appropriate branch of the tree according to the hash value. Once a leaf node is reached, the candidate is inserted based on one of the following conditions:

Condition 1: If the depth of the leaf node is equal to k (the root is assumed to be at depth 0), then the candidate is inserted regardless of the number of itemsets already stored at the node.

Condition 2: If the depth of the leaf node is less than k, then the candidate can be inserted as long as the number of itemsets stored at the node is less than maxsize. Assume maxsize = 2 for this question.

Condition 3: If the depth of the leaf node is less than k and the number of itemsets stored at the node is equal to maxsize, then the leaf node is converted into an internal node. New leaf nodes are created as children of the old leaf node. Candidate itemsets previously stored in the old leaf node are distributed to the children based on their hash values. The new candidate is also hashed to its appropriate leaf node.

How many leaf nodes are there in the candidate hash tree? How many internal nodes are there?

Consider a transaction that contains the following items: {1, 2, 3, 5, 6}. Using the hash tree constructed in part (a), which leaf nodes will be checked against the transaction? What are the candidate 3-itemsets contained in the transaction?

Question 5.

Consider a group of documents that has been selected from a much larger set of diverse documents so that the selected documents are as dissimilar from one another as possible. If we consider documents that are not highly related (connected, similar) to one another as being anomalous, then all of the documents that we have selected might be classified as anomalies. Is it possible for a data set to consist only of anomalous objects or is this an abuse of the terminology?

Reference no: EM132518699

Questions Cloud

Webstore implementation and maintenance plan : Recommend an installation strategy for PVF's student furniture webstore. Write a implementation and maintenance plan for the new webstore.
Create the journal entries that would appear in txa ltd : TXA Ltd acquired a machine from Blue Ltd, Make the journal entries that would appear in TXA Ltd.'s books to account for the acquisition of the Machine.
Determine the five number summary of sample : What assumption will have to be made if we want to construct a confidence interval for the mean time spent by all shoppers in the mall? Explain.
How many ways are there to distribute a red ball : How many ways are there to distribute a red ball, a yellow ball, a green ball, a blue ball, a purple ball and 3 identical white balls
Data mining : Identify at least two advantages and two disadvantages of using color to visually represent information.
How many bonds need issued to receive required amount : How many bonds need to be issued to receive the required amount of fund? What is the firm after-taxed cost of debt given the tax rate is 30%?
Determine the five number summary of sample : The owner of a shopping mall studied the shopping habits of his customers. In order to estimate the mean time spent by shoppers, a random sample
Calculate how much can consume in may : If Amy wishes to buy household goods worth $5000 in May 2020 and pays all her other bills totalling $35,000, Calculate how much can she consume in May 2021?
Identify which would be the best option for north : Identify which would be the best option for North, showing all calculations. North Ltd is entering into an interstate expansion phase

Reviews

Write a Review

Basic Computer Science Questions & Answers

  Ethical for government to agree to grant terrorist immunity

In a hostage crises, is it ethical for a government to agree to grant a terrorist immunity if he releases the hostages,

  Risk response strategies

In your experience, why is mitigation not always the 'chosen' means of responding to risks?

  Why is data privacy a concern in the e-commerce environment

What is different about data security concerns in the Internet environment vs. the non-Internet environment?

  Identify and describe any potential ethical issues

Identify and describe any potential ethical issues that could arise in connection with the new architecture.

  Discuss the impact of this revision on the bioilm structure

Division requires energy. Thus, revise the simulation so that a dividing cell consumes nutrition from its own and, to a lesser extent, its neighboring cells. Discuss the impact of this revision on the bioilm structure.

  Which are the results of the architecture design phase

What is the minimum performance criterion, and why is it important in the design of fault-tolerant systems? Which are the results of the architecture design phase?

  General data protection regulation to european union

Compare the revised General Data Protection Regulation (GDPR) to European Union (EU) laws related to personal data protection,

  How many gigabytes in a petabyte

How many gigabytes in a petabyte

  Discuss the euclidean algorithm that finds greatest common

We discuss the Euclidean algorithm that finds the greatest common divisor of 2 numbers u and v. We want to extend and compute the gcd of n integers gcd?

  Calculate the tco of technology assets

Describe the cost components used to calculate the TCO of technology assets.

  Implication of the internet

Give a response to the following statement: An implication of the Internet 2 is that the discoveries made sometimes force themselves

  Please take a position on this: pro or con

Please take a position on this: pro or con.  Establish your thread with your position.  You must give real live examples to back up your position.  Make sure you participate in other student threads with reasoned responses and/or counter arguments.  ..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd