Create a scatter plot of resultant clusters for each value

Assignment Help Applied Statistics

Reference no: EM132230057 , Length: word count:1000

Clustering Assignment -

Format is needed in a R Markdown report.

Data set included: clustering-data.csv

Labeled data is not always available. For these types of datasets, you can use unsupervised algorithms to extract structure. The k-means clustering algorithm and the k nearest neighbor algorithm both use the Euclidean distance between points to group data points. The difference is the k-means clustering algorithm does not use labeled data.

In this problem, you will use the k-means clustering algorithm to look for patterns in an unlabeled dataset. The dataset for this problem is found at data/clustering-data.csv.

a. Plot the dataset using a scatter plot.

b. Fit the dataset using the k-means algorithm from k=2 to k=12. Create a scatter plot of the resultant clusters for each value of k.

c. As k-means is an unsupervised algorithm, you cannot compute the accuracy as there are no correct values to compare the output to. Instead, you will use the average distance from the center of each cluster as a measure of how well the model fits the data. To calculate this metric, simply compute the distance of each data point to the center of the cluster it is assigned to and take the average value of all of those distances.

Calculate this average distance from the center of each cluster for each value of k and plot it as a line chart where k is the x-axis and the average distance is the y-axis.

d. One way of determining the "right" number of clusters is to look at the graph of k versus average distance and finding the "elbow point". Looking at the graph you generated in the previous example, what is the elbow point for this dataset?

Attachment:- Assignment Files.rar

Reference no: EM132230057

Questions Cloud

Are certain mood disorders overly diagnosed in adolescents : In your opinion, are certain mood disorders overly diagnosed in children and adolescents? Support your rationale using specific and insightful examples.

What are the key success factors in mondelez international : Describe the 5 competitive forces in Mondelez International. What are the key success factors in Mondelez International?

What is the strategic value from consumer responses : What is the strategic (or marketing) value from consumer responses due to brand knowledge or financial value from incremental sales amounts and decreased costs

Identify a community to which you actually belong : Identify a community to which you actually belong. It can be geographic (where you live/work/volunteer), virtual, based on identity, or functional.

Create a scatter plot of resultant clusters for each value : Clustering Assignment - Fit the dataset using the k-means algorithm from k=2 to k=12. Create a scatter plot of the resultant clusters for each value of k

Why is it important to make knowledge work visible : Why is it important to make knowledge work visible? In the technology value stream, which best describes lead time?

Expanding its product line to include three new products : Alan Industries is expanding its product line to include three new products. Calculate the objective value using Excel Solver.

How do you think that the problem can be at least reduced : How do you think that this problem can be at least reduced, if not solved? What do you think about the idea of "ZERO WASTE" as a goal for individuals.

Plot the data from each dataset using a scatter plot : Assignment - Introduction to Machine Learning - Assignment - Introduction to Machine Learning. Format is needed in a R Markdown report

Reviews

len2230057

2/8/2019 1:02:32 AM

Need 1000+ words report. Labeled data is not always available. For these types of datasets, you can use unsupervised algorithms to extract structure. The k-means clustering algorithm and the k nearest neighbor algorithm both use the Euclidean distance between points to group data points. The difference is the k-means clustering algorithm does not use labeled data. In this problem, you will use the k-means clustering algorithm to look for patterns in an unlabeled dataset. The dataset for this problem is found at data/clustering-data.csv. Format is needed in a R Markdown report.

Write a Review

Required(*) Message

User Account

All Pages