Reference no: EM133765618
Project: Dissimilarities between data objects
This project demonstrates how to measure similarities between data objects. These topics described are mostly in chapter 6 Statistical Machine Learning from ‘Practical Statistics for Data Scientists'. Cover in the project the following:
Find some data examples and show examples of calculating
Euclidean distance
L1 distance
Prove or disprove that Euclidean and L1 distance satisfy
Positivity
d(x,y) >= 0 for all x and y,
d(x,y) == 0 only if x == y.
Symmetry
d(x,y) == d(y,x) for all x and y.
Triangle Inequality
d(x,z) <= d(x,y) + d(y,z) for all points x, y, and z
1. Explain why it is not possible or why it is possible to
1. rearrange data so Euclidean distance gives the same meaning as Hamming distance
2. show that measure d=1-cos(x,y) satisfies positivity, symmetry, and triangle Inequality
2. Draw conclusions about what is important when choosing the distance measure for the evaluation of dissimilarities between data objects.