Demonstrate your skills for data clustering

Assignment Help Other Engineering
Reference no: EM132090027

Assessment: Individual Problem solving task

Learning Outcomes

This assessment assesses the following Unit Learning Outcomes (ULO) and related Graduate Learning Outcomes (GLO):

ULO 1: Apply suitable clustering/dimensionality reduction techniques to perform unsupervised learning of data in a real-world

Purpose
In this assignment, you need to demonstrate your skills for data clustering and dimensionality reduction. There are two parts of this assignment

Instructions
This is an individual assessment task of maximum 20 pages including all relevant material, graphs, images and tables. Students will be required to provide responses for series of problem situations related to their analysis techniques. They are also required to provide evidence through articulation of the scenario, application of programming skills, analysis techniques and provide a rationale for their response

Task A - Clustering
Download BBC sports dataset from the Cloud. This dataset consists of 737 documents from the BBC Sport website corresponding to sports news articles in five topical areas from 2004-2005. There are 5 class labels: athletics, cricket, football, rugby, tennis. The original dataset and raw text files can be downloaded from here

1. There are 3 files in the dataset corresponding to the feature matrix, the class labels and the term dictionary. You need to read these files in Python notebook and store in variables X, trueLabels, and terms.

2. Next perform K-means clustering with 5 clusters using Euclidean distance as similarity measure. Evaluate the clustering performance using adjusted rand index and adjusted mutual information. Report the clustering performance averaged over 50 random initializations of K-means

3. Repeat K-means clustering with 5 clusters using a similarity measure other than Euclidean distance. Evaluate the clustering performance over 50 random initializations of K-means using adjusted rand index and adjusted mutual information. Report the clustering performance and compare it with the results obtained in step 2

4. For clustering cases (Euclidean distance and the other similarity measure), visualize the cluster centres using Tag cloud using Python package WordCloud.

Task B - (Dimensionality Reduction using PCA/SVD

For the provided BBC sports dataset, perform PCA and plot the captured variance with respect to increasing latent dimensionality. What is the minimum dimension that captures (a) at least 95% variance and (b) at least 98% variance?

Reference no: EM132090027

Questions Cloud

Characteristics of loans and marketable securities : Compare the characteristics of loans and marketable securities in terms of liquidity, risk, and information costs.
Reduce the problem of adverse selection : How does the use of collateral and net worth help reduce the problem of adverse selection?
Describe application of a change theory : Research change theories in scholarly literature and on the Internet. Develop a scenario and describe application of a change theory from the perspective.
Apply the ethical value of fairness to these policies : Apply the ethical value of fairness to these policies: is it to discriminate against the long-term unemployed?
Demonstrate your skills for data clustering : SIT720 - Machine Learning - Deakin university - Demonstrate your skills for data clustering and dimensionality reduction. There are two parts of assignment
Applied to different organization structures : Explain the types of strategy that can be applied to different organization structures.
Hulu of providing each plan is basically the same : How do you explain the wide variety of prices for virtually the same service, given that the cost to Hulu of providing each plan is basically the same?
Explain the change proposal project components : In this assignment, students will pull together the change proposal project components they have been working on throughout the course to create a proposal.
The price elasticity of demand can range : A marketer will likely need to reposition a product if ______. The price elasticity of demand can range between

Reviews

len2090027

8/20/2018 12:02:24 AM

PART 2 Excellent Good Fair Unsatisfactory For the provided BBC sports dataset: * Perform PCA * Plot the captured variance with respect to increasing latent dimensionality. * What is the minimum dimension that captures (a) at least 95% variance and (b) at least 98% variance? 5 marks Successfully completed all three tasks. 3 marks Successfully completed any two of the three tasks. 2 marks Successfully completed any one of the three tasks. 0 mark Failed to complete any given task.

len2090027

8/20/2018 12:02:18 AM

For clustering cases (Euclidean distance and the other similarity measure reported in previous two tasks), visualise the cluster centres using Tag cloud using Python package WordCloud 5 marks Successfully used the WordCloud Package to visualise the cluster centres using at least two different similarity measures. 3 marks Successfully used the WordCloud Package to visualise the cluster centres using at least one similarity measure. 2 marks Demonstrated knowledge in WordCloud Package and visualisation, but cannot use them successfully. 0 mark Failed to show any evidence of knowledge in WordCloud Package and visualisation.

len2090027

8/20/2018 12:02:11 AM

Criteria 3: * Repeat K-means clustering with 5 clusters using a similarity measure other than Euclidean distance. * Evaluate the clustering performance over 50 random initializations of K-means using adjusted rand index and adjusted mutual information. * Report the clustering performance and compare it with the results obtained in step 2. 5 marks Successfully completed all three tasks. 3 marks Successfully completed any two of the three tasks. 2 marks Successfully completed any one of the three tasks. 0 mark Failed to complete any given task.

len2090027

8/20/2018 12:02:05 AM

Criteria 2: * Perform K-means clustering with 5 clusters using Euclidean distance as similarity measure. * Evaluate the clustering performance using adjusted rand index and adjusted mutual information. * Report the clustering performance averaged over 50 random initializations of K-means. 5 marks Successfully completed all three tasks. 3 marks Successfully completed any two of the three tasks. 2 marks Successfully completed only one of the three tasks. 0 mark Failed to complete any given task.

len2090027

8/20/2018 12:01:59 AM

Assessment Task 1: Individual problem-solving rubric Criteria Excellent Good Fair Unsatisfactory Criteria 1: Reading files corresponding to the feature matrix, class labels and the term dictionary and store them in variables X, true Labels and terms using Python notebook. 5 marks Successfully read all files and stored in corresponding variables using Python notebook. 3 marks Partially achieved the goal by missing reading or storing one file or variable. 2 marks Only able to either reading files or creating variables in Python to store any value. 0 mark Fail to read and store using Python notebook.

len2090027

8/20/2018 12:01:37 AM

Medical To cover medical conditions of a serious nature, e.g. hospitalisation, serious injury or chronic illness. Note: Temporary minor ailments such as headaches, colds and minor gastric upsets are not serious medical conditions and are unlikely to be accepted. However, serious cases of these may be considered. Compassionate e.g. death of close family member, significant family and relationship problems. Hardship/Trauma e.g. sudden loss or gain of employment, severe disruption to domestic arrangements, victim of crime. Note: Misreading the timetable, exam anxiety or returning home will not be accepted as grounds for consideration.

len2090027

8/20/2018 12:01:16 AM

This document supplies detailed information on assessment tasks for this unit. Key information • Due: 22 by 11.30pm AEST • Weighting: 25% • Word count: max 20 pages including all relevant material, graphs, images and tables ULO 1: Apply suitable clustering/dimensionality reduction techniques to perform unsupervised learning of data in a real-world GLO 1: Discipline knowledge and capabilities GLO 3: Digital literacy GLO 4: Critical thinking GLO 5: Problem solving

Write a Review

Other Engineering Questions & Answers

  Characterization technology for nanomaterials

Calculate the reciprocal lattice of the body-centred cubic and Show that the reciprocal of the face-centred cubic (fcc) structure is itself a bcc structure.

  Calculate the gasoline savings

How much gasoline do vehicles with the following fuel efficiencies consume in one year? Calculate the gasoline savings, in gallons per year, created by the following two options. Show all your work, and draw boxes around your answers.

  Design and modelling of adsorption chromatography

Design and modelling of adsorption chromatography based on isotherm data

  Application of mechatronics engineering

Write an essay on Application of Mechatronics Engineering

  Growth chracteristics of the organism

To examine the relationship between fermenter design and operating conditions, oxygen transfer capability and microbial growth.

  Block diagram, system performance and responses

Questions based on Block Diagram, System Performance and Responses.

  Explain the difference in a technical performance measure

good understanding of Mil-Std-499 and Mil-Std-499A

  Electrode impedances

How did this procedure affect the signal observed from the electrode and the electrode impedances?

  Write a report on environmental companies

Write a report on environmental companies

  Scanning electron microscopy

Prepare a schematic diagram below of the major parts of the SEM

  Design a pumping and piping system

creating the pumping and piping system to supply cool water to the condenser

  A repulsive potential energy should be a positive one

Using the data provided on the webvista site in the file marked vdw.txt, try to develop a mathematical equation for the vdW potential we discussed in class, U(x), that best fits the data

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd