Reference no: EM131919047
Assignment: Clustering
Your task for this assignment is to implement and evaluate the k-means clustering algorithm.
1. Implement the k-means clustering algorithm.
a. You can use any programming language that you are familiar with.
b. The program should be executable with at least 3 parameters: the name of the dataset file, k, and the name of the output file.
c. The output file should contain numerical class labels (formatted as one number per row) for all the records in the test dataset and report the sum squared error (SSE) in the last row.
d. You only need to handle numerical attributes (categorical attributes are not required).
2. Select two datasets from the UCI repository and evaluate the algorithm using SSE and another metric of your choice (e.g. BCubed precision and recall or Jaccard score if you have the class labels) with varying k. (I intend to run your implementation on the fisher iris dataset without the labels.
3. Write a brief report to:
a. Describe the datasets.
b. Describe your implementation and experiment setup, e.g. any preprocessing you performed on the dataset such as normalizing the attributes, distance metrics you used, etc.
c. Present the experiment results with varying k.
d. Discuss the insights and conclusions from your experiments.
4. This is an individual assignment.
5. Submission. You will upload two items to Canvas: your PDF report and a zip or tar file.
This zip/tar file must contain:
Your source files (include your name(s) in commented form at the top of all source files), the executable, a README file explaining how to compile/run your program, the output files for your test datasets.
Design and operation of the new management accounting system
: Write a report to senior management explaining how you, as the management accountant, may contribute to the design and operation of management accounting system
|
What is the probability that at least
: If 7 people appear at random to give blood, what is the probability that at least one of them is Rh-Negative?
|
Calculate the expected average flow time
: New Time Videos (NTV) is a new online video rental service. In the field, it is trying to compete by offering its customers access to all of the major.
|
Examine the number of parking spaces needed
: Pizza Time Restaurants is building a new pizza place and needs to determine how big to make the various parts of its facility.
|
Describe your implementation and experiment setup
: Describe your implementation and experiment setup, eg. any preprocessing you performed on dataset such as normalizing attributes, distance metrics you used etc.
|
How many customers on average can mike salon process
: Assuming that the waiting area always has at least one customer in it, how many customers on average can Mike's salon process in a day.
|
Mean life of compact fluorescent light bulbs
: If a light bulb manufacturing company wants to? estimate, with 95?% ?confidence, the mean life of compact fluorescent light bulbs to within ±175 hours
|
Identify and give examples of the steps that can be taken
: Identify and give examples of the steps that can be taken at the functional level to improve Post's efficiency, product quality, an ability to innovate.
|
What was the average miles per hour per trip
: What was the average miles per hour per trip? (Show work)
|