Reference no: EM133245287
1) What is the type of the following kinds of attributes (a) age (in years), (b) salary, (c) ZIP code, (e) height, and (f) intensity of rain? Classify them as continuous or discrete, and as qualitative (nominal or ordinal) or quantitative (interval or ratio).
2)An analyst sets up a sensor network in order to measure the temperature of different locations over a time period. What is the type of attributes collected (temperature)? What is the type of the dataset?
3) It is desired to partition customers into similar groups on the basis of their demographic profile.
a. What features could we use? Provide 3 examples. Would you describe such data as heterogeneous?
b. Which data mining problem is best suited to this task?
4)Suppose that you had a set of arbitrary objects, each representing different characteristics of gadgets. A domain expert gave you the similarity value between every pair of objects. How would you convert these objects into a multidimensional data set for clustering the gadgets ?
5)Suppose that you had a data set, such that each data point corresponds to sea-surface temperatures over a square mile of resolution 10×10. In other words, each data record contains a 10×10 grid of temperature values with spatial locations. You also have some text associated with each 10×10 grid. How would you convert this data into a multidimensional data set? How many features will each data point have?
6) Compute the cosine similarity, Jaccard coefficient (if possible, for binary vectors), Euclidean distance, correlation coefficient for the following vectors, x, y:
a. x = (0, -1, 1, 2,-2), y = (0, -2, 2, 4, -4)
b. x = (0, 1, 0, 0, 0), y = (0, 1, 0, 0, 1)
c. x = (-1, -1, -1, -1, -1), y = (1, 1, 1, 1, 1)
7) Compute the cosine similarity and the Jaccard coefficient, between the two sets {A, B, C} and {A, C, D, E}. Hint: how will you represent each set?
8) Create three documents, A, B, and C such that the Euclidean distance between A and B is smaller than the Euclidean distance between A and C, even though documents A and B have no common words whereas documents A and C have some common words.
9) Are the following similarity measures good or bad for finding similarity in document-term data? Provide a one-line justification for each answer you provide.
a. correlation
b. cosine
c. Euclidean