K-Means Clustering Exercise

Assignment Help Data Structure & Algorithms
Reference no: EM132490070

K-Means Clustering Exercise -

In this exercise, you will use the R Studio interface to run the k-means clustering method. Unlike classification methods, KMeans clustering method groups data instances based on common characteristics. Each instance does not have a predefined label or class.

Exercise Instructions -

Part 1 - Complete an exercise in a Word document KMeans Clustering Using R.docx on seeds.csv data. Save the file on your hard drive, and follow the instructions in the document to load the file into R.

Get familiar with k-means clustering method, including input parameters. Note the differences between clustering and classification.

Your output might be slightly different depending on R and R Studio version. The overlapping labels on the cluster plot are acceptable since those are hard to fix. You do not need to write a report for this part.

Part 2 - Run an exercise on a vehicle dataset and write a report on your findings and results interpretation in your own words. The report needs to cover the exercise key points below in order.

Download the vehicle.csv file to your hard drive.

1. Introduction - What do you expect the k-means clustering method to accomplish for the vehicle data?

2. Data pre-processing

Run the set.seed command. Include the command on the report and explain the reason for running this command.

Load the data from vehicle.csv file into R. Create a copy of the vehicle dataset called myvehicle. Include the command in the report.

Remove the variable class from a myvehicle. Include the command in the report, and explain why we remove the class variable.

Run the scale command to scale the myvehicle. Include the command in the report, and explain why we scale data.

Discuss any additional data pre-processing that you run. Include the commands and explain what each command does in the report.

3. Run the kmeans method with k=4 and store the output in the variable kc.

Include the command in the report and discuss the input parameters you used.

Enter kc at the command prompt and hit enter. Include the command output in the report and answer the following questions.

How many instances are in each cluster?

What information does the cluster means section of an output provides and how were the numbers obtained?

What is clustering vector?

What is sum of squares by cluster, and what does it mean?

Run the kc$iter command, and explain what the output shows. Include the command, the output, and explanation in the report.

4. Clustering evaluation

Build the cross-tabulation to compare how the method clustered the vehicles with the actual vehicle class. Include the command and the output in the report. Answer the following questions.

What is the dominant vehicle class in each cluster?

What additional information does the table show?

What percentage of vehicles were clustered in agreement with the actual class?

5. Build the cluster plot. Include the command, the plot, and the plot interpretation in the report.

6. Experiment with 3 different k values, and summarize the findings in the tabular format.

Explain the effect of k values on method results.

What is an ideal value of k for the vehicle data? (This is an open-ended question)

7. Summary

What differences between k-means clustering and classification methods did you observe?

Which part of this exercise did you find the most challenging and which approach did you take to resolve the challenge?

Submit the following - The report addressing the key points above and An R script with commands your ran and brief comments on the commands purpose.

Attachment:- K-Means Clustering Exercise.rar

Reference no: EM132490070

Questions Cloud

What the disclosure of information about key sources : What the Disclosure of information about key sources of estimation uncertainty, Is either voluntary or mandatory, Depends on the industry
Calculate the average rate of return : Calculate the average rate of return, the net present value, profitability index, the internal rate of return, which project should be accepted? why?
Calculate the density per cubic centimetre of bacteria : Calculate the density per cubic centimetre of Bacteria A on the agar plate if there are 1500 bacteria on a 100cm3 plate.
Conserve tropical forest and biodiversity : What are 10 different approaches to conserve tropical forest and biodiversity? Which ones are the most effective.
K-Means Clustering Exercise : K-Means Clustering Exercise - Introduction - What do you expect the k-means clustering method to accomplish for the vehicle data
Discuss the leverage and risk aspects of each structure : If the firm is fairly certain that its EBIT will exceed $78,000, which structure would you recommend? Why? What is the tax rate was higher say 40%?
Establish relationship of organic and inorganic chemistry : Establish the relationship of organic and inorganic chemistry in the dark and light reaction of photosysthesis?
Compare the relative risks of the two firms : Compute the degree of operating, financial, and total leverage for Firm R. Compute the degree of operating, financial and total leverage for Firm W.
Write a cover letter and have an up to date resume : Write a cover letter and have an up to date resume - Spend a little time searching the Internet for sample cover letters and resumes

Reviews

Write a Review

Data Structure & Algorithms Questions & Answers

  Implement an open hash table

In this programming assignment you will implement an open hash table and compare the performance of four hash functions using various prime table sizes.

  Use a search tree to find the solution

Explain how will use a search tree to find the solution.

  How to access virtualised applications through unicore

How to access virtualised applications through UNICORE

  Recursive tree algorithms

Write a recursive function to determine if a binary tree is a binary search tree.

  Determine the mean salary as well as the number of salaries

Determine the mean salary as well as the number of salaries.

  Currency conversion development

Currency Conversion Development

  Cloud computing assignment

WSDL service that receives a request for a stock market quote and returns the quote

  Design a gui and implement tic tac toe game in java

Design a GUI and implement Tic Tac Toe game in java

  Recursive implementation of euclids algorithm

Write a recursive implementation of Euclid's algorithm for finding the greatest common divisor (GCD) of two integers

  Data structures for a single algorithm

Data structures for a single algorithm

  Write the selection sort algorithm

Write the selection sort algorithm

  Design of sample and hold amplifiers for 100 msps by using n

The report is divided into four main parts. The introduction about sample, hold amplifier and design, bootstrap switch design followed by simulation results.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd