Define problem on the dataset

Assignment Help Basic Computer Science
Reference no: EM132358882

1. Choose data mining problem and data set:

For this project, you must choose your own dataset. It can be one found from an on-line source, one of your own, or one of the ones from the UCI repository A list of additional dataset sources is provided at the end of this document. If you would like to use data collection API to curate data, the attached document provides an example of using R to collect Twitter data.

Some rules/tips about choosing data sets:

a. Do not choose the datasets that we have already analyzed in class.

b. It should not be a small or made-up dataset. For this semester, "small" is defined as fewer than 1000 examples in the dataset.

c. Choose a data set that does not require excessive data preprocessing.

2. Experiment design:

Define a problem on the dataset and describe it in terms of its real-world organizational or business application. The complexity level of the problem should be at least comparable to one homework assignment. The problem may use at least TWO different types of data mining algorithms that we have studied this semester such as Classification, Clustering and Association Rules, in an investigation of the analytics solution to the problem.

This investigation must include some aspects of experimental comparison: depending on the problem, you may choose to experiment with different types of algorithms, e.g. different types of classifiers, and some experiments with tuning parameters of the algorithms. Alternatively, if your problem is suitable, you may use multiple algorithms (Clustering + Classification, etc.). If there are a larger number of attributes, you can try some type of feature selection to reduce the number of attributes. You may use summary statistics and visualization techniques to help you explain your findings.

3. Final project paper:

To complete this project, write a final report that conforms to general research paper format. See (Pang, Lee, and Vaithyanathan, 2002) as an example. Your report should be within 6 pages, 1 inch margin on all sides, and at least 12 point Arial or Times New Roman. Remember that your project paper serves as the tour guide for your readers to be able to repeat your data mining process and discover the same patterns as you did. It is very important to cite and paraphrase relevant work appropriately.

Reference no: EM132358882

Questions Cloud

Provided by intrusion detection system-what is honeypot : List and briefly define three classes of intruders. What is a honeypot? What are three benefits that can be provided by an intrusion detection system?
Administering their underlying hosting environments : SaaS products supplied by cloud providers are relieved of the responsibilities of implementing and administering their underlying hosting environments.
Information governance in health care and medical sector : Write on the importance and impact of Information Governance in Health care and medical sector.
Discuss role of VPN in supporting the security of business : BN305 Virtual Private Networks Assignment - SSL/TLS VPN Technologies, Melbourne Institute of Technology, Australia. Discuss role of VPN in supporting security
Define problem on the dataset : Define a problem on the dataset and describe it in terms of its real-world organizational or business application
Discuss the principle of least privilege : Discuss the principle of least privilege (POLP promotes minimal user profile privileges on databases based on the idea that limiting user's rights,
Discuss what security through obscurity means with examples : Discuss what security through obscurity means with examples. Discuss the advantages and disadvantages of security through obscurity,
Why the reading was thought provoking : Choose and read chapter the laws of fifth discipline from the book and write a personal reflection of 300 words - Why the reading was thought provoking
Promotion and marketing opportunities : Promotion and marketing opportunities. Customer communications capabilities, such as automated e-mail confirmation of orders

Reviews

Write a Review

Basic Computer Science Questions & Answers

  Identifies the cost of computer

identifies the cost of computer components to configure a computer system (including all peripheral devices where needed) for use in one of the following four situations:

  Input devices

Compare how the gestures data is generated and represented for interpretation in each of the following input devices. In your comparison, consider the data formats (radio waves, electrical signal, sound, etc.), device drivers, operating systems suppo..

  Cores on computer systems

Assignment : Cores on Computer Systems:  Differentiate between multiprocessor systems and many-core systems in terms of power efficiency, cost benefit analysis, instructions processing efficiency, and packaging form factors.

  Prepare an annual budget in an excel spreadsheet

Prepare working solutions in Excel that will manage the annual budget

  Write a research paper in relation to a software design

Research paper in relation to a Software Design related topic

  Describe the forest, domain, ou, and trust configuration

Describe the forest, domain, OU, and trust configuration for Bluesky. Include a chart or diagram of the current configuration. Currently Bluesky has a single domain and default OU structure.

  Construct a truth table for the boolean expression

Construct a truth table for the Boolean expressions ABC + A'B'C' ABC + AB'C' + A'B'C' A(BC' + B'C)

  Evaluate the cost of materials

Evaluate the cost of materials

  The marie simulator

Depending on how comfortable you are with using the MARIE simulator after reading

  What is the main advantage of using master pages

What is the main advantage of using master pages. Explain the purpose and advantage of using styles.

  Describe the three fundamental models of distributed systems

Explain the two approaches to packet delivery by the network layer in Distributed Systems. Describe the three fundamental models of Distributed Systems

  Distinguish between caching and buffering

Distinguish between caching and buffering The failure model defines the ways in which failure may occur in order to provide an understanding of the effects of failure. Give one type of failure with a brief description of the failure

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd