Different approaches to detect outliers in dataset

Assignment Help Basic Computer Science
Reference no: EM132718010

1. What's an attribute? What's a data instance?

2. What's noise? How can noise be reduced in a dataset?

3. Define outlier. Describe 2 different approaches to detect outliers in a dataset.

4. Describe 3 different techniques to deal with missing values in a dataset. Explain when each of these techniques would be most appropriate.

5. Given a sample dataset with missing values, apply an appropriate technique to deal with them.

6. Give 2 examples in which aggregation is useful.

7. Given a sample dataset, apply aggregation of data values.

8. What's sampling?

9. What's simple random sampling? Is it possible to sample data instances using a distribution different from the uniform distribution? If so, give an example of a probability distribution of the data instances that is different from uniform (i.e., equal probability).

10. What's stratified sampling?

11. What's "the curse of dimensionality"?

12. Provide a brief description of what Principal Components Analysis (PCA) does. [Hint: See Appendix A and your lecture notes.] State what's the input and what the output of PCA is.

13. What's the difference between dimensionality reduction and feature selection?

14. Describe in detail 2 different techniques for feature selection.

15. Given a sample dataset (represented by a set of attributes, a correlation matrix, a co-variance matrix, ...), apply feature selection techniques to select the best attributes to keep (or equivalently, the best attributes to remove).

16. What's the difference between feature selection and feature extraction?

17. Give two examples of data in which feature extraction would be useful.

18. Given a sample dataset, apply feature extraction.

19. What's data discretization and when is it needed?

20. What's the difference between supervised and unsupervised discretization?

21. Given a sample dataset, apply unsupervised (e.g., equal width, equal frequency) discretization, or supervised discretization (e.g., using entropy).

22. Describe 2 approaches to handle nominal attributes with too many values.

23. Given a dataset, apply variable transformation: Either a simple given function, normalization, or standardization.

24. Definition of Correlation and Covariance, and how to use them in data pre-processing (see pp. 76-78).

 

Reference no: EM132718010

Questions Cloud

Journalize the entry to record the costs transferred : The materials cost for each oven is $100. Use lean accounting to: Journalize the entry to record the costs transferred to finished goods
How does integrity relate to the wholeness of a human life : MacIntyre, in his account of Aristotelian virtue, states that integrity is the one trait of character that encompasses all the others.
How much are willing to pay for the bond : Paid 10% annual interest compounded quarterly. Your required rate of return on the bond is 9%. How much are you willing to pay for this bond?
Journalize the entries to record the costs charged : The conversion cost for 2,400 hours of production is budgeted for the year at $4,800,000. Journalize the entries to record the costs charged
Different approaches to detect outliers in dataset : What's an attribute? What's a data instance? Define outlier. Describe 2 different approaches to detect outliers in a dataset.
Determine the dollar amount of dividends : On December 31, 2018, Adelphi Corporation has outstanding 500 shares of $100 par value, 6% cumulative and nonparticipating preferred stock, and 7,000 shares.
What is the total dollar amount received after accounting : Euro and the exercise price is US1.25/Euro. If the option is exercised, what is the total dollar amount received after accounting for the premium paid?
Determine the manufacturing cost per unit : Each oven requires 6 minutes of cell processing time. The materials cost for each oven is $100. Determine the manufacturing cost per unit
Determine the budgeted cell conversion cost per hour : During January, 2,000 microwave ovens were started and completed. Use lean accounting to: Determine the budgeted cell conversion cost per hour

Reviews

Write a Review

Basic Computer Science Questions & Answers

  Identifies the cost of computer

identifies the cost of computer components to configure a computer system (including all peripheral devices where needed) for use in one of the following four situations:

  Input devices

Compare how the gestures data is generated and represented for interpretation in each of the following input devices. In your comparison, consider the data formats (radio waves, electrical signal, sound, etc.), device drivers, operating systems suppo..

  Cores on computer systems

Assignment : Cores on Computer Systems:  Differentiate between multiprocessor systems and many-core systems in terms of power efficiency, cost benefit analysis, instructions processing efficiency, and packaging form factors.

  Prepare an annual budget in an excel spreadsheet

Prepare working solutions in Excel that will manage the annual budget

  Write a research paper in relation to a software design

Research paper in relation to a Software Design related topic

  Describe the forest, domain, ou, and trust configuration

Describe the forest, domain, OU, and trust configuration for Bluesky. Include a chart or diagram of the current configuration. Currently Bluesky has a single domain and default OU structure.

  Construct a truth table for the boolean expression

Construct a truth table for the Boolean expressions ABC + A'B'C' ABC + AB'C' + A'B'C' A(BC' + B'C)

  Evaluate the cost of materials

Evaluate the cost of materials

  The marie simulator

Depending on how comfortable you are with using the MARIE simulator after reading

  What is the main advantage of using master pages

What is the main advantage of using master pages. Explain the purpose and advantage of using styles.

  Describe the three fundamental models of distributed systems

Explain the two approaches to packet delivery by the network layer in Distributed Systems. Describe the three fundamental models of Distributed Systems

  Distinguish between caching and buffering

Distinguish between caching and buffering The failure model defines the ways in which failure may occur in order to provide an understanding of the effects of failure. Give one type of failure with a brief description of the failure

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd