Reference no: EM132354305
Project - Using Smart Watch Sensors for Activity Recognition
Introduction - In this case we continue our investigation of the sensor data from smart watches to build a supervised learning model to distinguish among different physical activities. After training some alternative models and validating them with a test set of data, you'll then use your best model with some "new" observations to classify the activities that an individual is engaging in.
Unlike the earlier analyses, this case calls upon you to work in a fairly unstructured fashion to create a classification model that can recognize human activities. By now, you have seen several modeling strategies for a categorical classifier: decision trees, logistic regression, random forest, and SVM. You should investigate and evaluate these approaches and implement the model that, in your judgment, best performs the task.
Background Reading: The data for this project was provided by the authors of an article entitled "Smart Devices are Different: Assessing and Mitigating Mobile Sensing Heterogeneities for Activity Recognition." The article is posted on the course website and explains the research as well as defining the available variables in the datasets. Based on the article, it appears that the LG watches have more consistent measurements, so we will just look at the two LG models for this analysis.
Software and Data: The accelerometer file contains approximately 3.5 million rows. Your analysis and investigations should rely on R code.
In the raw data file, please notice that we have data collected from 9 users (identified only as a, b, c, d, e, f, g, h, and i) who wore the different watches while engaging in the activities for a period of time. Hence, each user wore 4 different watches which recorded 3 dimensions of accelerometer data while the activities were on-going.
Because the data are generated through continuous time, we do not want to randomly select rows for our Training and Test subsamples. Instead of randomly dividing the data into two partitions, for this project you should randomly choose 6 of the 9 users (67%) as the Training subjects, and the other three as the Test subjects.
Your model may rely on any combination of the appropriate available variables.
In addition, there is also a csv file called "HARnew". This file contains 311 time-sequenced observations of one individual engaging in several (fewer than six) activities. After you develop your model, run these new observation through the model to identify the activities corresponding to each of the rows of data.
The columns marked "a_x", "a_y", "a_z" are the 3 dimensions of accelerometer data.
The Challenge: Using the observations from the two LG watch models, build a "Human Activity Recognition" (HAR) model that uses some or all of the measurement variables to classify the activities of the person. You should develop at least four models and evaluate those four models by creating confusion matrices and calculating the misclassification rates for each; your report should present the performance evaluations and explain why you have chosen the model you selected. Finally, identify the activities that you believe the individual was engaging in, and indicate which of the 311 rows of the new data table correspond to each activity (e.g. "rows 100-150 indicate that the subject was dancing")
Deliverable: You should prepare a Word document created with R markdown providing the analysis and discussion. In addition.
The R Markdown document should be a technical document including:
- Relevant R code that you used. NOTE: no doubt you explored models that you quickly rejected and/or wrote code that failed with error messages. BE SELECTIVE in reporting your code: show us the code that was actually informative for you. Use comments and prose liberally to annotate the code as well as the output.
- Selected graphs, tables, or statistics summarizing the models you investigated and supporting your final choice. You should include confusion matrices and a comparison of misclassification rates.
- Output that shows how you used the model to classify the 311 "new" observations.
Other points to keep in mind.
1. Based on the article, it appears that the 2 LG watches have sensors with similar accuracy and sensitivity. Therefore, in creating your models you may pool all of the data and not concern yourself with different predictive models for the two different watches.
2. How did you handle the task of variable selection in building your models? Discuss the method by which you decided which predictors to rely upon.
3. Comment on the extent to which the accelerometer appear to be more or less useful in identifying different activities.
Attachment:- Assignment Files.rar