Implement the k-means clustering algorithm

Assignment Help Computer Engineering
Reference no: EM131900261

Programming Exercise: K-means Clustering and Principal Component Analysis

Introduction

In this exercise, you will implement the K-means clustering algorithm and apply it to compress an image. In the second part, you will use principal component analysis to find a low-dimensional representation of face images. Before starting on the programming exercise, we strongly recommend watch- ing the video lectures and completing the review questions for the associated topics.

To get started with the exercise, you will need to download the starter code and unzip its contents to the directory where you wish to complete the exercise. If needed, use the cd command in Octave/MATLAB to change to this directory before starting this exercise.
You can also find instructions for installing Octave/MATLAB in the "En- vironment Setup Instructions" of the course website.

1 K-means Clustering

In this this exercise, you will implement the K-means algorithm and use it for image compression. You will first start on an example 2D dataset that will help you gain an intuition of how the K-means algorithm works. After that, you wil use the K-means algorithm for image compression by reducing the number of colors that occur in an image to only those that are most common in that image. You will be using ex7.m for this part of the exercise.

You will implement the two phases of the K-means algorithm separately in the next sections.

Finding closest centroids
Your task is to complete the code in findClosestCentroids.m. This function takes the data matrix X and the locations of all centroids inside centroids and should output a one-dimensional array idx that holds the index (a value in 1, ..., K , where K is total number of centroids) of the closest centroid to every training example.

You can implement this using a loop over every training example and every centroid.

Once you have completed the code in findClosestCentroids.m, the script ex7.m will run your code and you should see the output [1 3 2] corresponding to the centroid assignments for the first 3 examples.

Computing centroid means

You should now complete the code in computeCentroids.m. You can implement this function using a loop over the centroids. You can also use a loop over the examples; but if you can use a vectorized implementation that does not use such a loop, your code may run faster.

After you have completed the two functions (findClosestCentroids and computeCentroids), the next step in ex7.m will run the K-means algorithm on a toy 2D dataset to help you understand how K-means works. Your functions are called from inside the runKmeans.m script. We encourage you to take a look at the function to understand how it works. Notice that the code calls the two functions you implemented in a loop.

When you run the next step, the K-means code will produce a visualiza- tion that steps you through the progress of the algorithm at each iteration. Press enter multiple times to see how each step of the K-means algorithm changes the centroids and cluster assignments.

Random initialization
The initial assignments of centroids for the example dataset in ex7.m were designed so that you will see the same figure as in Figure1. In practice, a good strategy for initializing the centroids is to select random examples from the training set.

In this exercise, you will apply K-means to image compression. In a straightforward 24-bit color representation of an image,2 each pixel is repre- sented as three 8-bit unsigned integers (ranging from 0 to 255) that specify the red, green and blue intensity values. This encoding is often refered to as the RGB encoding. Our image contains thousands of colors, and in this part of the exercise, you will reduce the number of colors to 16 colors.

By making this reduction, it is possible to represent (compress) the photo in an efficient way. Specifically, you only need to store the RGB values of the 16 selected colors, and for each pixel in the image you now need to only store the index of the color at that location (where only 4 bits are necessary to represent 16 possibilities).

In this exercise, you will use the K-means algorithm to select the 16 colors that will be used to represent the compressed image. Concretely, you will treat every pixel in the original image as a data example and use the K-means algorithm to find the 16 colors that best group (cluster) the pixels in the 3- dimensional RGB space. Once you have computed the cluster centroids on the image, you will then use the 16 colors to replace the pixels in the original image.

Optional (ungraded) exercise: Use your own image
In this exercise, modify the code we have supplied to run on one of your own images. Note that if your image is very large, then K-means can take a long time to run. Therefore, we recommend that you resize your images to managable sizes before running the code. You can also try to vary K to see the effects on the compression.

2 Principal Component Analysis
In this exercise, you will use principal component analysis (PCA) to perform dimensionality reduction. You will first experiment with an example 2D dataset to get intuition on how PCA works, and then use it on a bigger dataset of 5000 face image dataset.

Implementing PCA
In this part of the exercise, you will implement PCA. PCA consists of two computational steps: First, you compute the covariance matrix of the data.

Then, you use Octave/MATLAB's SVD function to compute the eigenvec- tors U1, U2, . . . , Un. These will correspond to the principal components of variation in the data.
Before using PCA, it is important to first normalize the data by subtract- ing the mean value of each feature from the dataset, and scaling each dimen- sion so that they are in the same range. In the provided script ex7 pca.m, this normalization has been performed for you using the featureNormalize function.

Face Image Dataset

In this part of the exercise, you will run PCA on face images to see how it can be used in practice for dimension reduction. The dataset ex7faces.mat contains a dataset3 X of face images, each 32 32 in grayscale. Each row of X corresponds to one face image (a row vector of length 1024). The next step in ex7 pca.m will load and visualize the first 100 of these face images (Figure7).

PCA on Faces
To run PCA on the face dataset, we first normalize the dataset by subtracting the mean of each feature from the data matrix X. The script ex7 pca.m will do this for you and then run your PCA code. After running PCA, you will obtain the principal components of the dataset. Notice that each principal component in U (each row) is a vector of length n (where for the face dataset, n = 1024). It turns out that we can visualize these principal components by reshaping each of them into a 32 32 matrix that corresponds to the pixels in the original dataset. The script ex7 pca.m displays the first 36 principal components that describe the largest variations (Figure8). If you want, you can also change the code to display more principal components to see how they capture more and more details.

Dimensionality Reduction
Now that you have computed the principal components for the face dataset, you can use it to reduce the dimension of the face dataset. This allows you to use your learning algorithm with a smaller input size (e.g., 100 dimensions) instead of the original 1024 dimensions. This can help speed up your learning algorithm.

The next part in ex7 pca.m will project the face dataset onto only the first 100 principal components. Concretely, each face image is now described by a vector z(i) R100.

In the earlier K-means image compression exercise, you used the K-means algorithm in the 3-dimensional RGB space. In the last part of the ex7 pca.m script, we have provided code to visualize the final pixel assignments in this 3D space using the scatter3 function. Each data point is colored according to the cluster it has been assigned to. You can drag your mouse on the figure to rotate and inspect this data in 3 dimensions.

Attachment:- implement the K-means clustering algorithm.rar

Reference no: EM131900261

Questions Cloud

What is the failure rate if all new businesses : What is the failure rate if all new businesses? what us the failure rate of all new franchises? what inferences can you make from these numbers?
Calculate abby basis for gain : Calculate Abby's basis for gain, loss, and cost recovery for the portion of her personal residence that was converted to business use
Define and give an example of an informal loyalty program : Define and give an example of an informal and formal loyalty program and correlate how it lends to customer satisfaction and/or customer recovery.
Define the discrete values to delineate unique categories : If you are using any ordinal variables, define the values you will use, and how you will delineate order of magnitude between the points on the scale.
Implement the k-means clustering algorithm : Programming Exercise: K-means Clustering and Principal Component Analysis - implement the K-means clustering algorithm and apply it to compress an image
Prepare the entry to record the issuance of the bonds : On May 1, 2017, Shamrock Company issued 1,900 $1,000 bonds at 102. Prepare the entry to record the issuance of the bonds and warrants
What are some different ways the pantheon came : What are some different ways the Pantheon came to be used over time?
Develop a long term philosophy in order to be successful : It is NOT important to design corporate goals and key performance indicators (KPIs) that drive the organization towards common initiatives.
What is the best structure for the sales force : Given the markets the company is operating in, what is the best structure for the sales force? How did you decide?

Reviews

len1900261

3/13/2018 7:28:24 AM

After completing various parts of the assignment, be sure to use the submit function system to submit your solutions to our servers. The following is a breakdown of how each part of this exercise is scored. Part Submitted File Points Find Closest Centroids findClosestCentroids.m 30 points Compute Centroid Means computeCentroids.m 30 points PCA pca.m 20 points Project Data projectData.m 10 points Recover Data recoverData.m 10 points Total Points 100 points You are allowed to submit your solutions multiple times, and we will take only the highest score into consideration.

Write a Review

Computer Engineering Questions & Answers

  In the it field group projects often have strict deadlines

groups will be assigned no later than the end of unit 1. if you wish to be involved in a project-managed group please

  What are the ip fragment offsets and more flags

Given an IP packet of 540 bytes and a maximum packet size of 200 bytes, what are the IP Fragment Offsets and More Flags for the appropriate packet fragments?

  What websites offer information on cloud computing

Write a 1- to 2-page paper discussing what professional organizations you might want to join, what websites offer information on cloud computing?

  Questionuse ms access and create a database think about the

questionuse ms access and create a database. think about the information you would require if you were a travel agent

  Write a paper discussing the article and its relevance

Find a current article about UNIX Networking. Write a 525-word paper discussing the article and its relevance to the course.

  How might you improve your performance in the case

After a computer forensics investigation, you need to meet with your department or group of fellow investigators and critique the case in an effort to improve your work. define how to make a self-evaluation of your work by answering.

  Define the cache and virtual memories

A processor has a direct addressing range of 64 KB. Show two schematics to extend the address range to 512 KB.

  Write down an object oriented program in java

take a rectangular matrix of cells, each of which can contain an organism. If the matrix is viewed as extending indefinitely in both directions, then each cell has eight neighbors, the eight cells surrounding it.

  Discuss the pros and cons of using such specialized solution

Discuss the pros and cons of using such specialized solutions instead of executing the sort algorithm every time the sort command is issued.

  Design program using switch statements in visual basic

design program using switch statements in Visual Basic.

  What is crosstalk and how does it affect a signal

What is white noise and how does it affect a signal? What is impulse noise and why is it the most disruptive? What is crosstalk and how does it affect a signal?

  Remove all branches from the list so that you can run mkdir

Create a directory list from bar1 with find and order it if necessary. Remove all branches from the list so that you can run mkdir only on the leaves.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd