Objective: to show if the sizes of the clusters influence the communication cost among a set on nodes
Write a K-Means program for a set of M points randomly distributed on an NxN plane and a randomly distributed of K cluster centers. The program should be completely documented.
The points on the plane are identified with integers 1, 2, 3, . . . M
The frequency of communication among pair of points (i, j) is f(i, j) =floor (abs(i - j))/2 for 0< i , j < M + 1.
f(i, j) =f(j, i) and the path of communication from i to j is the same with the path of communication from j to i.
The transmission distance of a center point is D = sqr (5)N/2, therefore any two center points that are In distance less or equal to D should be connected and thus forming a 'backbone network'. Construct an all short path table to be used for communication of center points by using a shortest path between them.
There are three types of communications that can affect the workload of a cluster center.
a. Inter-cluster communications (two points i, j that belong to the same cluster communicate via their center point)
b. Intra -cluster communication (two points i, j that belong to two different clusters communicate as follows: point i with its cluster center , to the cluster center that j belongs to following a shortest path in the backbone network and from this cluster center to the point i)
c. The 'door matt effect' on a cluster center when it participates as a stepping stone in a communication of points which neither one of the points belong to the cluster of the door matt center.
Run your program for each one of the K values below;
K=3, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,25, 30
N = 25x25
M = 100 the 100 points should be randomly distributed and remain the same for all runs.
For each run calculate the communication cost for each one of the K centers. Tabulate the results for each run and plot then , i.e. 14 pots, one per run
Use the tables or the plots to compare the workloads of the K centers
What I am looking for is the correlation (if any) between the workloads of the centers and sizes of the clusters
Submit your program , the input to it, the various backbone networks (14 0f them), the tables (14 of them) and the plots (14 of them)
Also write a comprehensive report with your observations, justifications and comments.