Define a problem on the dataset

Assignment Help Other Subject
Reference no: EM132352626

Question: 1. Choose data mining problem and data set:

For this project, you must choose your own dataset. It can be one found from an on-line source, one of your own, or one of the ones from the UCI repository. A list of additional dataset sources is provided at the end of this document. If you would like to use data collection API to curate data, the attached document provides an example of using R to collect Twitter data.

Some rules/tips about choosing data sets:

a. Do not choose the datasets that we have already analyzed in class.

b. It should not be a small or made-up dataset. For this semester, "small" is defined as fewer than 1000 examples in the dataset.

c. Choose a data set that does not require excessive data preprocessing.

2. Experiment design:

Define a problem on the dataset and describe it in terms of its real-world organizational or business application. The complexity level of the problem should be at least comparable to one homework assignment. The problem may use at least TWO different types of data mining algorithms that we have studied this semester such as Classification, Clustering and Association Rules, in an investigation of the analytics solution to the problem.

This investigation must include some aspects of experimental comparison: depending on the problem, you may choose to experiment with different types of algorithms, e.g. different types of classifiers, and some experiments with tuning parameters of the algorithms. Alternatively, if your problem is suitable, you may use multiple algorithms (Clustering + Classification, etc.). If there are a larger number of attributes, you can try some type of feature selection to reduce the number of attributes. You may use summary statistics and visualization techniques to help you explain your findings.

3. Final project paper:

To complete this project, write a final report that conforms to general research paper format. See (Pang, Lee, and Vaithyanathan, 2002) as an example. Your report should be within 6 pages, 1 inch margin on all sides, and at least 12 point Arial or Times New Roman. Remember that your project paper serves as the tour guide for your readers to be able to repeat your data mining process and discover the same patterns as you did. It is very important to cite and paraphrase relevant work appropriately.

Reference no: EM132352626

Questions Cloud

Describe signs that may indicate an individual is distressed : L/601/8143-Support Individuals Who are Distressed-Explain how working with an individual who is distressed may impact on own well being.
Evaluate different types of research methods : Evaluate different types of research methods. Compare and contrast the differences between qualitative and quantitative research methodologies.
Describe how current legislation relates to assisting : K/502/7583-Understanding and Enabling Assisting and Moving Individuals- Describe how current legislation relates to assisting and moving individuals.
Determine how the principle impacts data security : Read five articles and discuss principle of least privilege(POLP promotes minimal user profile privileges on databases based on the idea that limiting user's.
Define a problem on the dataset : Define a problem on the dataset and describe it in terms of its real-world organizational or business application. The complexity level of the problem should be
How you plan on correcting the errors : Throughout the term, feedback on your milestones has been provided. What is important about feedback is not only do you need to read the feedback.
Examine risk management approaches the organization consider : Identify a SMB of your choice. In a 500-word paper, examine risk management approaches the organization may consider deploying for mobile device management and.
Distinguishing set of characteristics : Label each cell with the name, distinguishing set of characteristics, either total magnification or approximate size (in um), and its function.
Plot the curve for risk and expected returns : Case Motivation - PARTNERS HEALTHCARE - Plot the curve for risk and expected returns of the optimal portfolio combinations involving the 4 asset classes

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd