Cluster analysis project, Database Management System

Assignment Help:

(a) Data Mining Process: In the context of this cluster analysis project, and in your own words, explain how you would execute the first stage of data mining, namely the "Pre-modelling" stage. Be sure to differentiate the sub-tasks in this stage

(b) Pre-modelling: Describe the potential business problem and data mining problem in the context of this project. Be sure to differentiate these two problems in your description.

(c) Data Preparation: Use the "seeds_dataset_twoClass.csv" file to prepare the dataset for cluster analysis. You can use the following table format to justify the data type (i.e., measurement) and direction (i.e., role) used for each attribute.

 

Attribute

Data Type

(or Measurement)

Direction (or Role)

(Input, Target or None)

Justification

(d) Data Exploration: Analyse the dataset "seeds_dataset_twoClass.csv" using the following summary statistics in the Data Audit node. Discuss the use of these summary statistics for deciding if further data preparation is required.

a. Mean and Standard Deviation (Std. Dev), Min and Max

b. % Complete and Valid Records

c. Outliers and Extremes

 (e) Data Preparation: From the scenario and data given, explain why the attribute A3 (compactness) is probably not useful for cluster analysis. Prepare the data (for mining) by filtering out this field using IBM SPSS Modeller.

(f) Executing Clustering Technique: Decide on the number of clusters (i.e., K) and then execute K-Means on the filtered dataset. Assess the appropriateness of applying K-Means on this dataset. Interpret the clustering results.

(g) Interpreting Clustering Results: Use the Graphboard node to generate a scatter plot based on attributes A4 and A5. The plot should show each data point labelled or coloured based on the cluster number assigned by K-Means. Evaluate the clustering results using this plot (and you may also use the project information given in the Background section of this assignment).

(h) Data Preparation: Having read your preliminary analysis, a colleague gave the following comment: "the dataset should have been normalised before the clustering process." Evaluate the clustering solutions with and without normalisation and then discuss whether normalisation is necessary in this case.


Related Discussions:- Cluster analysis project

Physical records, Physical Records These are the records that are store...

Physical Records These are the records that are stored in the secondary storage devices. For the database relation, physical records are the group of fields kept in adjacent me

Queries, Find the names of all Albums that have more than 30 tracks. Result...

Find the names of all Albums that have more than 30 tracks. Result: (name: varchar(255))

Misuse of data or a database, As with other software, databases can potenti...

As with other software, databases can potentially be used for unethical purposes. As a database developer, and a consumer, you should recognize database misuse, and how it may affe

Critically evaluate the bulleted list of information, Critically evaluate t...

Critically evaluate the bulleted list of information-related items in this case study. How are each contradictory to the notion of being an information-literate knowledge worker?

What is a name, What is a Name? A user-supplied name is used for identi...

What is a Name? A user-supplied name is used for identity. This form of identity is used for files in file systems. The user gives every file a name that uniquely identifies it

What is bankers algorithm, What is banker's algorithm?  Banker's algori...

What is banker's algorithm?  Banker's algorithm is a deadlock avoidance algorithm that is applicable to a resource-allocation system with multiple instances of each resource ty

What are the types of reference, What are the types of reference?explain it...

What are the types of reference?explain it with suitable examples? Object oriented language given the ability to refer the object attribute of the type can be referred to the s

Produce a logical database design for pharma, Using a database design appro...

Using a database design approach of your choice, produce a logical design for Pharma. 1) Your answer must consist of ONE the following: An entity-relationship (ER) diagra

Coursework, ERD Diagram of Medical Recruitment Agency?

ERD Diagram of Medical Recruitment Agency?

What are the benefits of decomposing a system, What are the benefits of dec...

What are the benefits of decomposing a system? The benefits of decomposing a system into subsystems are that after decomposition, each individual component become smaller and e

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd