Introduction to data mining-define outlier

Assignment Help Basic Computer Science
Reference no: EM132441834

Intro to Data Mining

1. What's an attribute? What's a data instance?

2. What's noise? How can noise be reduced in a dataset?

3. Define outlier. Describe 2 different approaches to detect outliers in a dataset.

4. Describe 3 different techniques to deal with missing values in a dataset. Explain when each of these techniques would be most appropriate.

5. Given a sample dataset with missing values, apply an appropriate technique to deal with them.

6. Give 2 examples in which aggregation is useful.

7. Given a sample dataset, apply aggregation of data values.

8. What's sampling?

9. What's simple random sampling? Is it possible to sample data instances using a distribution different from the uniform distribution? If so, give an example of a probability distribution of the data instances that is different from uniform (i.e., equal probability).

10. What's stratified sampling?

11. What's "the curse of dimensionality"?

12. Provide a brief description of what Principal Components Analysis (PCA) does. [Hint: See Appendix A and your lecture notes.] State what's the input and what the output of PCA is.

13. What's the difference between dimensionality reduction and feature selection?

14. Describe in detail 2 different techniques for feature selection.

15. Given a sample dataset (represented by a set of attributes, a correlation matrix, a co-variance matrix, ...), apply feature selection techniques to select the best attributes to keep (or equivalently, the best attributes to remove).

16. What's the difference between feature selection and feature extraction?

17. Give two examples of data in which feature extraction would be useful.

18. Given a sample dataset, apply feature extraction.

19. What's data discretization and when is it needed?

20. What's the difference between supervised and unsupervised discretization?

21. Given a sample dataset, apply unsupervised (e.g., equal width, equal frequency) discretization, or supervised discretization (e.g., using entropy).

22. Describe 2 approaches to handle nominal attributes with too many values.

23. Given a dataset, apply variable transformation: Either a simple given function, normalization, or standardization.

24. Definition of Correlation and Covariance, and how to use them in data pre-processing.

Reference no: EM132441834

Questions Cloud

Overall persuasiveness of company communications : What roles are played by external communications regarding the overall persuasiveness of company communications?
Research the four major categories of stress : For this SLP assignment, you will research the four major categories of stress, then develop an action plan that targets your greatest source of stress
List six reasons why carers must familiarise themselves : List six reasons why carers must familiarise themselves with the individualised plans for the people for whom they care?
Define the term interpersonal conflict : Define the term interpersonal conflict. Use examples to explain each element of the definition in detail.
Introduction to data mining-define outlier : What's an attribute? What's a data instance? Define outlier. Describe 2 different approaches to detect outliers in a dataset. What's stratified sampling?
Support healthy function body regulation : State four processes, conditions and resources that are required by the body to support healthy function body regulation?
Explain body action in details : Explain body action in details and explain how it can be used to enhance a speech. Does the concept lead to a less "boring" speech?
List the main organs of the body : List the main organs of the body and then, using the correct terminology, describe the positioning of one body part in relation to another.
Identify a communication principle contained : Identify a communication principle contained within this verse. Provide an example of this principle in action in our society.

Reviews

Write a Review

Basic Computer Science Questions & Answers

  The most essential in developing an input design

Explain what principle you believe is the MOST essential in developing an input design, and justify your choice.

  Make a function first-char that consumes a nonempty string

Make a function first-char that consumes a nonempty string and produces a string consisting of the first character in the original string. Do not use string-ref

  What percent of the radios will last

The playing life of a sunshine radio is normally distributed with a mean of 600 hours and a standard deviation of 100 hours.

  General aggregate statistics: total number of tickets sold

Present the data in a professionally formatted worksheet such that it is easily viewed/understood. Since visualization aids in understanding data, include an appropriate chart to illustrate each set of data. Lastly, analyze the data and provide any c..

  When you notice that someone is tall

When you notice that someone is tall, what type of diversity are you noticing?

  What different between before and after parallel

I need project consist 6 pages include problem before parallel and after parallel and I need all instruction for use this project in order to run program (Implementaion) and what different between before and after parallel

  What is intermodulation distortion

What is intermodulation distortion? What sorts of signals are susceptible to this form of distortion?  In addition, identify two situations in which error-free transmission is crucial to business processes.

  What is information extortion

What is the difference between a skilled hacker and an unskilled hacker (other than skill levels)? How does the protection against each differ?

  Essential to facilitate collaboration

The ID process is most often a group activity. Communication and collaboration are dominant throughout the ID process. The communication aspect is especially essential to facilitate collaboration.

  Associated with the programming language names

Briefly explain the "meanings" associated with the programming language names C++, Lisp, Prolog.

  People enjoy a high quality of life

1) Discuss whether a high per capita Real GDP in a country necessarily mean that people enjoy a high quality of life as well.

  Hired to test a parking lot calculation application

You are hired to test a parking lot calculation application. This application is provided for the convenience of travelers that use a certain parking lot, and provides them with an estimate of the cost they should expect to pay when parking on tha..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd