Reference no: EM132425126
1. As the Big Data ecosystem takes shape, there are four main groups of players within this interconnected web. List and explain those groups.
2. How the data science team evaluate whether the model is sufficiently robust to solve the problem or not? What are the questions that they should ask?
3. Explain the differences between Hexbinplot and Scatterplot and when to use each one of them.
4. k-means does not handle categorical data?
5. local retailer has a database that stores 10,000 transactions of last summer. After analyzing the data, a data science team has identified the following statistics:
? {battery} appears in 6,000 transactions.
? {sunscreen} appears in 5,000 transactions.
? {sandals} appears in 4,000 transactions.
? {bowls} appears in 2,000 transactions.
? {battery, sunscreen} appears in 1,500 transactions.
? {battery, sandals} appears in 1,000 transactions.
? {battery, bowls} appears in 250 transactions.
? {battery, sunscreen, sandals} appears in 600 transactions.
Answer the following questions:
a. What are the support values of the preceding itemsets?
b. Assuming the minimum support is 0.05, which itemsets are considered frequent?
6. Linear regression is an analytical technique used to model the relationship between several input variables and a continuous outcome variable. Linear regression can be used in business, government, and medical. Explain by example how it can be used in those domains.
7. Which classifier is considered computationally efficient for high-dimensional problems? Why?
8. Define the following time series components:
? Trend
? Seasonality
? Cyclic
? Random