Reference no: EM133266375
Assignment: Clustering with NASA Webserver Log Data
Choose any two Clustering Algorithms covered in class and apply to Nasa Webserver Log Data Set Plan your experiment with:
a. Determine Data preprocessing methods and Distance metric to apply for each of your Clustering algorithm.
b. For each clustering algorithm, Compare the accuracy of the classifier with at least two different sets of input parameters if applicable Experiment for Feature Selection with PCA tools or Your Own Experiment
c. Compare the accuracy of two Clustering algorithms
d. Discuss about your results:
e. Why your inducted model is different for the same training data as you change the parameter values.
f. Why a certain parameter setting shows with better accuracy than the others that you tried.
Question 1: Determine Data preprocessing methods to apply for each of your Clustering
Question 2: Apply two different versions of a Clustering Methods of Your Choice. Design your Data Analytic Experiment.
Question 3: Experiment to Find the Best Parameter Setting for your Clustering Methods.
- Measure
- Different Parameters/Thresholds
- The Number of Clusters K
NOTE: Experiment for Your Own Experiment as follow:
Simple Experiment to choose the best K, the number of the clusters. Q3.1 Pick the best parameter setting from Phase 2.
Question 4: Apply Your Clustering algorithm with the best parameters set to each different number of the clusters to see if there is any significant difference in the result for each iteration.
Question 5: Validate your Clustering result for each Clustering method with different Parameter settings. For each Clustering result in your experiment, apply any method discussed in the Lecture notes (Ward's method, Silhouette score, Elbow method, Entropy/Purity, etc) to Measure the quality of the Clustering result.
Question 6: Discuss about your results: Discuss on the measure of your Clustering result for each Clustering method with different Parameter settings.