Data clustering using k-means

Assignment Help Data Structure & Algorithms
Reference no: EM13889468


Project Title: Data Clustering using K-means

In this project, students are required to cluster Amazon product reviews that belong to four product categories: books, electronic appliances, dvds, and kitchen appliances. Moreover, each category is further divided into positive-valued sentiment reviews and negative-valued sentiment reviews. In total, you will find reviews that belong to 4 × 2 = 8 categories in the data file attached "data.txt".

The format of the data file is as follows. Each line of the data file corresponds to one review. The first element in the line represents the label of the instance (e.g. kitchen-positive indicates that the review is a positive sentiment review about some kitchen appliance). The next elements (separated by spaces) in the line represent the unigram and bigram features extracted from the review. Note that the two words in a bigram feature are connected by two underscores. Reviews are represented using binary-valued features (i.e. each feature appears exactly once in a given line).


(1) Write a program to load the data instances to memory from the provided file data.txt.

(2) Implement the k-means clustering algorithm with Euclidean distance to cluster the instances into k clusters. Make sure that you normalize each feature vector to unit L2 length before computing Euclidean distances.

(3) Instead of selecting the mean in a cluster,

i. select the instance that is closest to the mean as the cluster center when performing k-means clustering and

ii. select k-medoid method to perform clustering

(4) Evaluate the clusters obtained in step 2, 3 and 4 using cross validation evaluation method.

(5) Briefly discuss which clustering method is best for this data and why?

Submission Instructions

• Submit

(a) the source code for all your programs,

(b) a README file (plain text) describing how to compile/run your code to produce the various results

(c) a PDF file providing the answers of all above questions

Compress all of the above files into a single zip/rar file and name it with your registration number.

Reference no: EM13889468

Questions Cloud

What will be the total expected foreign exchange gain : What will be the total expected foreign exchange gain or loss for both the interest payment and the value of the bond (in percentage) for Company A each year in the next eight years?
The standard deviation of a list of numbers is a measure : The standard deviation of a list of numbers is a measure of how much the numbers deviate from the averag
A global manufacturer of electrical switching equipment : 1.A global manufacturer of electrical switching equipment (ESE) is considering outsourcing the manufacturing of an electrical breaker used in the manufacturing of switch boards.
How does mild hypoxia affect airline crew : What is Mild Hypoxia? And how does mild hypoxia affect airline crew? Present a detailed and research based answer to these questions.
Data clustering using k-means : Write a program to load the data instances to memory from the provided file data.txt.
A firm in ohio is thinking of buying a plant : 1.A firm in Ohio is thinking of buying a plant from a regional business group located in a Southeast Asian country.
Who are the potential stakeholders involved in the situation : Who are the potential stakeholders involved in this situation? What alternatives does Tony have in this situation? What might the company do to prevent this situation from occurring?
Personal reflection essay on role of professional nurse : Write a 500 word, personal reflection on how your perspective on the role of the professional nurse has changed since the beginning of this course. Include details of how this course has influenced your understanding of role clarity.
Overlap between financial and management accounting : Are you surprised by the topics that management accountants are focusing on? Why or why not? What interests you more, financial accounting or management accounting?


Write a Review

Data Structure & Algorithms Questions & Answers

  Implement an open hash table

In this programming assignment you will implement an open hash table and compare the performance of four hash functions using various prime table sizes.

  Use a search tree to find the solution

Explain how will use a search tree to find the solution.

  How to access virtualised applications through unicore

How to access virtualised applications through UNICORE

  Recursive tree algorithms

Write a recursive function to determine if a binary tree is a binary search tree.

  Determine the mean salary as well as the number of salaries

Determine the mean salary as well as the number of salaries.

  Currency conversion development

Currency Conversion Development

  Cloud computing assignment

WSDL service that receives a request for a stock market quote and returns the quote

  Design a gui and implement tic tac toe game in java

Design a GUI and implement Tic Tac Toe game in java

  Recursive implementation of euclids algorithm

Write a recursive implementation of Euclid's algorithm for finding the greatest common divisor (GCD) of two integers

  Data structures for a single algorithm

Data structures for a single algorithm

  Write the selection sort algorithm

Write the selection sort algorithm

  Design of sample and hold amplifiers for 100 msps by using n

The report is divided into four main parts. The introduction about sample, hold amplifier and design, bootstrap switch design followed by simulation results.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd