Difference in dimensionality reduction and feature selection

Assignment Help Basic Computer Science
Reference no: EM132911243

Intro to Data Mining

Chapter 2.

1. What's an attribute? What's a data instance?

2. What's noise? How can noise be reduced in a dataset?

3. Define outlier. Describe 2 different approaches to detect outliers in a dataset.

4. Describe 3 different techniques to deal with missing values in a dataset. Explain when each of these techniques would be most appropriate.

5. Given a sample dataset with missing values, apply an appropriate technique to deal with them.

6. Give 2 examples in which aggregation is useful.

7. Given a sample dataset, apply aggregation of data values.

8. What's sampling?

9. What's simple random sampling? Is it possible to sample data instances using a distribution different from the uniform distribution? If so, give an example of a probability distribution of the data instances that is different from uniform (i.e., equal probability).

10. What's stratified sampling?

11. What's "the curse of dimensionality"?

12. Provide a brief description of what Principal Components Analysis (PCA) does. [Hint: See Appendix A and your lecture notes.] State what's the input and what the output of PCA is.

13. What's the difference between dimensionality reduction and feature selection?

14. Describe in detail 2 different techniques for feature selection.

15. Given a sample dataset (represented by a set of attributes, a correlation matrix, a co-variance matrix, ...), apply feature selection techniques to select the best attributes to keep (or equivalently, the best attributes to remove).

16. What's the difference between feature selection and feature extraction?

17. Give two examples of data in which feature extraction would be useful.

18. Given a sample dataset, apply feature extraction.

19. What's data discretization and when is it needed?

20. What's the difference between supervised and unsupervised discretization?

21. Given a sample dataset, apply unsupervised (e.g., equal width, equal frequency) discretization, or supervised discretization (e.g., using entropy).

22. Describe 2 approaches to handle nominal attributes with too many values.

23. Given a dataset, apply variable transformation: Either a simple given function, normalization, or standardization.

24. Definition of Correlation and Covariance, and how to use them in data pre-processing.

Reference no: EM132911243

Questions Cloud

Whats impairment distributed between buildings and equipment : Silever Limited operates a unit (CGU) that suffered a very significant drop. What is the impairment distributed between buildings and equipment?
Distinguish data mining from analytical tools and techniques : Distinguish data mining from other analytical tools and techniques. What are the main reasons for the recent popularity of data mining?
Determine aricanly cost of goods sold : Cost of Goods Sold, Profit margin, and Net Income for a Manufacturing Company. For the month ended January 31, determine Aricanly's cost of goods sold
Determine which is the best option : receive $ 500,000 at the end of each of the next 10 years or receive a one-time payment of $ 2,500,000. Determine which is the best option.
Difference in dimensionality reduction and feature selection : What's the difference between dimensionality reduction and feature selection? What's stratified sampling? What's an attribute? What's a data instance?
How long would take to double the investment : If an investor intends to double $ 25,000 by investing in a bank that pays 7% interest per year, determine how long it would take to double the investment.
Develop and present a lecture related to puberty : Develop and present a lecture related to puberty and the changes that one will encounter as they go through this developmental stage. What factors would you
How much is the equity portion of the bonds : After the payment of interest, bonds having a face value of P1,000,000 were converted into shares. How much is the equity portion of the bonds
Place time on horizontal access : Place time on the horizontal access (X Axis). Values should range from 1650 to 2100. How long did it take for the population to double a second time?

Reviews

Write a Review

Basic Computer Science Questions & Answers

  What will be the quantizing error

A 12-bit AID converter has an input range of -5 to +5 V. Estimate the quantization error (as a percentage of reading) for an input -2.46.

  Cyber security and information governance

Reflect on the connection between knowledge or concepts from these courses (cyber security and Information Governance) and how those have been

  Policy-making process

Write a research paper that explains how Information Technology (IT) promotes getting people who are affected by policies involved in the policy-making process.

  Find the velocity of the exhaust gases

A turbojet aircraft is flying with a velocity of 280 m/s at an altitude of 9150 m, where the ambient conditions are 32 kPa and 232°C.

  Examine the relationship between these values

Create reporters for both of these in the network variant of the Spread of Disease model. Examine the relationship between these values and the mean number infected after fifty ticks.

  Determinants of supply for the car market in country

Briefly explain ANY FOUR (4) determinants of supply for the car market in your country?

  Find several websites that offer products and services

Using the Internet and search engines, find several websites that offer products and services for sale. In addition to clicking an "I accept" button, how are computer users required to accept offers? List several means.

  Explain what hadoop is in relations to big data

Explain how big data differs from data stored in a relational database. Explain what Hadoop is in relations to big data. What is R?

  An issue in computer security is the concept of externality

An issue in computer security is the concept of externality. This is a term from economics. It means the cost of a decision is borne by people other than those taking the decision. For example, in the case of botnets, the 'costs' include (among..

  Organization overall strategy

The reading this week discusses strategy and how Enterprise Risk Management (ERM)can be integrated with an organization's overall strategy.

  What is the probability that it came from urn a

If a white ball is drawn, what is the probability that it came from Urn A?

  Providing security over data

The CIA triad (confidentiality, integrity, and availability) offers three (3) security tenets that allow data owners the framework to secure data.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd