Describe your implementation and experiment setup

Assignment Help Computer Engineering
Reference no: EM131919047

Assignment: Clustering

Your task for this assignment is to implement and evaluate the k-means clustering algorithm.

1. Implement the k-means clustering algorithm.

a. You can use any programming language that you are familiar with.

b. The program should be executable with at least 3 parameters: the name of the dataset file, k, and the name of the output file.

c. The output file should contain numerical class labels (formatted as one number per row) for all the records in the test dataset and report the sum squared error (SSE) in the last row.

d. You only need to handle numerical attributes (categorical attributes are not required).

2. Select two datasets from the UCI repository and evaluate the algorithm using SSE and another metric of your choice (e.g. BCubed precision and recall or Jaccard score if you have the class labels) with varying k. (I intend to run your implementation on the fisher iris dataset without the labels.

3. Write a brief report to:

a. Describe the datasets.

b. Describe your implementation and experiment setup, e.g. any preprocessing you performed on the dataset such as normalizing the attributes, distance metrics you used, etc.

c. Present the experiment results with varying k.

d. Discuss the insights and conclusions from your experiments.

4. This is an individual assignment.

5. Submission. You will upload two items to Canvas: your PDF report and a zip or tar file.

This zip/tar file must contain:

Your source files (include your name(s) in commented form at the top of all source files), the executable, a README file explaining how to compile/run your program, the output files for your test datasets.

Reference no: EM131919047

Questions Cloud

Design and operation of the new management accounting system : Write a report to senior management explaining how you, as the management accountant, may contribute to the design and operation of management accounting system
What is the probability that at least : If 7 people appear at random to give blood, what is the probability that at least one of them is Rh-Negative?
Calculate the expected average flow time : New Time Videos (NTV) is a new online video rental service. In the field, it is trying to compete by offering its customers access to all of the major.
Examine the number of parking spaces needed : Pizza Time Restaurants is building a new pizza place and needs to determine how big to make the various parts of its facility.
Describe your implementation and experiment setup : Describe your implementation and experiment setup, eg. any preprocessing you performed on dataset such as normalizing attributes, distance metrics you used etc.
How many customers on average can mike salon process : Assuming that the waiting area always has at least one customer in it, how many customers on average can Mike's salon process in a day.
Mean life of compact fluorescent light bulbs : If a light bulb manufacturing company wants to? estimate, with 95?% ?confidence, the mean life of compact fluorescent light bulbs to within ±175 hours
Identify and give examples of the steps that can be taken : Identify and give examples of the steps that can be taken at the functional level to improve Post's efficiency, product quality, an ability to innovate.
What was the average miles per hour per trip : What was the average miles per hour per trip? (Show work)

Reviews

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd