Reference no: EM133005443
Assessment Description
In this assignment, perform K-Means as a particular type of clustering by programming it in Python, then you will evaluate the use of clustering in a research article, and assess whether or not that research is correct.
To demonstrate completion of this assignment, create a Word document with your working code, screenshots of program results, and written answers to questions. Writing should be professional and rigorous, and include scientific/mathematical justification, where appropriate, for all conclusions reached. Upload your final Jupyter notebook and Word document to the LMS when complete.
Part 1: Operational Tasks
For the following exercises, work with the Framingham_training and Framingham_test data sets. Use only the Sex and Age fields. Standardize Age.
1. Run k-means clustering on the Framingham_training data set, requesting k = 2 clusters.
2. Construct a table of statistics summarizing your clusters. Describe what these two clusters consist of.
3. Perform k-means clustering on the Framingham_test data set, requesting k = 2 clusters.
4. Report the results from your test set. Are your clusters validated?
5. Again run k-means clustering on the Framingham_training data set, this time specifying k = 3 clusters.
6. Construct a table of statistics summarizing your clusters. Describe which records belong to each cluster.
7. Perform k-means clustering on the Framingham_test data set, specifying k = 3 clusters.
8. Report the results from your test set. Are your clusters validated?
9. Run k-means clustering on the Framingham_training data set. Specify k = 4 clusters.
10. Construct a table of statistics summarizing your four clusters. Clearly describe your four clusters.
11. Perform k-means clustering on the Framingham_test data set, requesting k = 4 clusters.
12. Report the results from your test set. Are your clusters validated?
13. Which of the clustering solutions, k = 2, 3, or 4, do you prefer, and why?
Part 2: Mathematical and Statistical Basis
1. Read Liu and Yang (2018). Discuss the clustering issues described in Section 2, including variable versus data clustering, hierarchical clustering, and oblique principal component clustering.
2. Continuing with Liu and Yang (2018), evaluate the self-organizing network discussed in Section 3.2, and in particular the force equations, for its applicability to the clustering issue discussed in the paper. Do these support the experimental design and results outlined in Section 4?
3. Finally, how do these model parameters affect the model-driven predictive model of space-time vectorcardiogram (VCG) signals described in Section 5 of Liu and Yang? Does the multiscale basis function model of VCG signals described in Section 5.1 follow logically from these results? Why or why not?
Include references to all theoretical concepts and works cited. Show all your steps with explanations. Explain major components of complex solutions, code, and any output. Include captions to tables, images, and diagrams. Use formal and detailed mathematical and scientific notation throughout the document.
While APA style is not required for the body of this assignment, solid academic writing is expected, and documentation of sources should be presented using APA formatting guidelines, which can be found in the APA Style Guide, located in the Student Success Center.
Attachment:- Topic - Assignment.rar