Find out optimal number of clusters

Assignment Help Other Subject

Reference no: EM132358271

Assignment -

New DG Food Agro are a multinational exporter of various grains from India since nearly 130 years. But their main product of exporting since early 1980s has been Wheat. They export wheat to countries like America, Afghanistan, Australia etc.

They started seeing varying exports of sales year on year for various countries. The reason that was theorized by them had a lot of natural causes like floods, country growth, population explosion etc. Now they need to decide which countries fall in the same range of export and which don't. They also need to know which countries export is low and can be improved and which countries are performing very well across the years.

The data provided right now is across 18 years. What they need is a repeatable solution which won't get affected no matter how much data is added across time and that they should be able to explain the data across years in less number of variables.

Objective: Our objective is to cluster the countries based on various sales data provided to us across years. We have to apply an unsupervised learning technique like K means or Hierarchical clustering so as to get the final solution. But before that we have to bring the exports (in tons) of all countries down to same scale across years. Plus, as this solution needs to be repeatable we will have to do PCA so as to get the principal components which explain max variance. Implementation:

1) Read the data file and check for any missing values.

2) Change the headers to country and year accordingly.

3) Cleanse the data if required and remove null or blank values.

4) After the EDA part is done, try to think which algorithm should be applied here.

5) As we need to make this across years we need to apply PCA first.

6) Apply PCA on the dataset and find the number of principal components which explain nearly all the variance.

7) Plot elbow chart or scree plot to find out optimal number of clusters.

8) Then try to apply K means, Hierarchical clustering and showcase the results.

9) You can either choose to group the countries based on years of data or using the principal components.

10) Then see which countries are consistent and which are largest importers of the good based on scale and position of cluster.

Attachment:- Assignment Files.rar

Reference no: EM132358271

Questions Cloud

Explore the promising areas of knowledge management : Topic: Comprehensive analytical case study. Enterprise Resource Planning (ERP) - Explore the promising areas of Knowledge Management in organizations

What issues you identify that are related to your function : MSP610 Logistics Management Assignment - Distance, University of Lusaka, Zambia. What issues you identify that are related to your function

Implement and monitor the plan for managing project finances : BSBPMG522 Undertake Project Work Assignment, Mercury Institute of Victoria, Australia. Implement and monitor the plan for managing project finances

Analysis and produce a board briefing paper for tabling : Topic 1: Walmart and Foreign Corruption. Analysis and produce a board briefing paper for tabling at the next meeting of the company's board of directors

Find out optimal number of clusters : Read the data file and check for any missing values. Plot elbow chart or scree plot to find out optimal number of clusters

Performing the formal discovery upon request : Execute expert testimony in defense of your computer forensics or incident response report. Performing the formal discovery upon request

Write a Java Application that uses an interactive GUI : COIT20256 - Data Structures and Algorithms Assignment, Central Queensland University, Australia. Write a Java Application that uses an interactive GUI

Definition of Decision tree : Definition of Decision tree? Feature of decision theory problem? Decision making under both certainty and uncertainty? Steps involve in solving decision problem

Discussion on a relations-oriented or a task-oriented leader : Please find journal articles to add into the document for both questions - Would you consider Woodside a relations-oriented or a task-oriented leader

User Account

All Pages