Compose a practical case for data mining that could employ

Assignment Help Computer Engineering

Reference no: EM133238409

Case: Clustering is one of the core techniques in data mining which allows us to identify similarities and patterns in data. Analyzing the existing clustering techniques for data, you will notice that the clustering problem can be defined as grouping records into n-groups (clusters). That way, two records in one cluster will have more in common than two records in different clusters. Some clustering methods define similarities as proximity to a cluster center, while others define it as proximity to other records in the cluster. The first group of methods then tries to find somewhat circular (spherical) clusters, while ones in the latter group work for clusters of other shapes as long as there is a space in between clusters (see Python docs (Links to an external site.) for more details). Also, some clustering methods allow us to define the number of clusters while others don't.

Clustering may have multiple applications and can be used with various data. It also may serve as a part of a recommendation mechanism (e.g., "People who bought this also liked that"), which can be used for more targeted marketing.

Depending on the application, clustering methods may be used differently. However, all common clustering methods do not allow them to prioritize any variables over others and do not allow them to create any special rules for defining cluster allocation rather than similarity-based ones discussed earlier.

Directions:

Compose a practical case for data mining that could employ clustering with a new set of conditions that would allow group records and won't fit into the existing paradigm of simple similarity with the equal treatment of all variables.

For example, a dataset of anonymous commuting rides may be deanonymized with clustering analysis. Then the condition for clustering may be to find rides with similar departure and arrival points, which had to happen around the same time of the day. Still, no more than one ride in the same cluster may be conducted on the same day (one person can not ride two vehicles on the same route around the same time on the same day). Then the clusters of similar rides conducted on different days may suggest the same commuter.

Reference no: EM133238409

Questions Cloud

Determine the remainder and quotient : COS4892 University of South Africa - Determine the remainder and quotient by dividing the number 40 by 6 by applying the algorithm for remainder and quotient

Design an algorithm to check if one of the optimal global : Design an algorithm to check if one of the optimal global alignments of two sequences A and B is also an optimal local alignment

Would you have any concerns supporting a patient : Advanced Clinical Pharmacology Discussion - Would you have any conflicts/concerns supporting a patient who choose holistic/allopathic medicine

Source and destination of n traffic pairs : CS 539 Illinois Institute Of Technology - exchange traffic. Let si and ti be the source and destination of n traffic pairs, with si and ti in different ISP

Compose a practical case for data mining that could employ : CIS 606 Park University Compose a practical case for data mining that could employ clustering with a new set of conditions that would allow group records

Why is it important that baum is using culture : United States History Since 1865 Paper - Why is it important that Baum is using culture and art to make a socio-political commentary of this era

What digit does each letter represent : What digit does each letter represent What information did you use to get started with narrowing down possible values for the letters? PXQ + XYX ----- QXQY

Discuss possible reasons for this discrepancy in testing : Discuss some possible reasons for this discrepancy in testing. Given what you know about this type of infection, what would you suggest to improve testing

What are the main tasks in extracting sentiment : What are the main tasks in extracting sentiment from data such as this? do not have to give a detailed architecture, just list the main challenges

User Account

All Pages