Different approaches to detect outliers in dataset

Assignment Help Basic Computer Science
Reference no: EM132899644

Chapter 2

1. What's an attribute? What's a data instance?

2. What's noise? How can noise be reduced in a dataset?

3. Define outlier. Describe 2 different approaches to detect outliers in a dataset.

4. Describe 3 different techniques to deal with missing values in a dataset. Explain when each of these techniques would be most appropriate.

5. Given a sample dataset with missing values, apply an appropriate technique to deal with them.

6. Give 2 examples in which aggregation is useful.

7. Given a sample dataset, apply aggregation of data values.

8. What's sampling?

9. What's simple random sampling? Is it possible to sample data instances using a distribution different from the uniform distribution? If so, give an example of a probability distribution of the data instances that is different from uniform (i.e., equal probability).

10. What's stratified sampling?

11. What's "the curse of dimensionality"?

12. Provide a brief description of what Principal Components Analysis (PCA) does. [Hint: See Appendix A and your lecture notes.] State what's the input and what the output of PCA is.

13. What's the difference between dimensionality reduction and feature selection?

14. Describe in detail 2 different techniques for feature selection.

15. Given a sample dataset (represented by a set of attributes, a correlation matrix, a co-variance matrix, ...), apply feature selection techniques to select the best attributes to keep (or equivalently, the best attributes to remove).

16. What's the difference between feature selection and feature extraction?

17. Give two examples of data in which feature extraction would be useful.

18. Given a sample dataset, apply feature extraction.

19. What's data discretization and when is it needed?

20. What's the difference between supervised and unsupervised discretization?

21. Given a sample dataset, apply unsupervised (e.g., equal width, equal frequency) discretization, or supervised discretization (e.g., using entropy).

22. Describe 2 approaches to handle nominal attributes with too many values.

23. Given a dataset, apply variable transformation: Either a simple given function, normalization, or standardization.

24. Definition of Correlation and Covariance, and how to use them in data pre-processing.

Reference no: EM132899644

Questions Cloud

Explain two examples of unicorporated businesses : In Business Organizations, what is the significance of being unicorported? Explain the two examples of unicorporated businesses?
Which was medical partnership : Dr. Webber joined the Gelder Medical Group, which was a medical partnership. Can federal agencies make their own rules and prosecute those who violate rules.
Competitive advantage for organization web strategy : Analyze how social media provides a competitive advantage for an organization's Web strategy,
Risk mitigation plan : Senior management at Health Network allocated funds to support a risk mitigation plan. You have been assigned to develop a draft of this new plan.
Different approaches to detect outliers in dataset : What's an attribute? What's a data instance? Define outlier. Describe 2 different approaches to detect outliers in a dataset.
What is message authentication code : What is a message authentication code? What is the difference between a private key and a secret key?
Differences between intrusive and nonintrusive tests : Discuss the differences between intrusive and nonintrusive tests. Is it possible to provide a secure environment with no intrusive tests?
Develop new information security policy : If you were asked by your employer to develop a new Information Security Policy, where would you turn to find resources to build this policy?
List the pros and cons of virtualization : List the pros and cons of virtualization. List reasons why companies should virtualize.

Reviews

Write a Review

Basic Computer Science Questions & Answers

  Statistical significance and practical significance

Describe the difference between statistical significance and practical significance.

  Basic structure of a valid html

Create a static Web page that includes the basic structure of a valid HTML (not XHTML) document.

  How does sensors play a role in global computing

How does Sensors play a role in global computing and big data? The rising importance of big-data computing stems from advances in many different technologies.

  Which structure best for the storyboard created

Case 2-3 Michael wants to create a Website based on his famous cooking show. He would like to provide instructions on how to create some of his favorite dishes. He would like his recipes to be displayed in very simple, step by step pages. Which st..

  Is there anything wrong with this arrangement

How do the customer and contractor know if each one completely understands the statement of work, the work breakdown structure, and the program plan?

  Find the linear velocity in millimeters per second

Consider the tip of each hand of a clock. Find the linear velocity in millimeters per second for each hand.

  How the result is stored in 11-bit signed integer register

Would the operation -567+(-458) cause an overflow, assuming the result is stored in an 11-bit signed integer register?

  Describe an efficient algorithm for determining

Describe an efficient algorithm for determining if S1 and S2 contain the same set of elements.

  List what the community and business

List what the Community and Business need/want from each other. Discuss if you agree or not with this arrangement.

  Data-information-business processes and feasibility study

Business processes can be identified everywhere, from ordering a sandwich at a local cafe´ to booking a flight on the website of an airline.

  Interpret your interval in this context

Pregnancy. In 1998 a San Diego reproductive clinic reported 49 live births to 207 women under the age of 40 who had previously been unable to conceive.

  Describe disaster recovery and business continuity

Define and describe business continuity. Discuss pros and cons of cloud-based backup operations. Define and describe disaster recovery.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd