Perform data mining steps on the given dataset

Assignment Help Database Management System
Reference no: EM13934402

Data Mining Project

In this project you will use the sentiment labelled sentences dataset provided in the following link: https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences This dataset contains review sentences labeled (classified) as positive and negative such as the following two sentences from imdb movie reviews:

Wasted two hours. 0
Saw the movie today and thought it was a good effort, good messages for kids. 1

If the sentence is labeled as 0, it means a negative comment, if it is labeled as 1 it means a positive comment. There are 3 different files (imdb_labelled.txt, amazon_cells_labelled.txt, yelp_labelled.txt) each containing 500 positive and 500 negative sentences. (amazon and yelp datasets contain more number of instances but the ones labeled as 0 or 1 should be considered only). This data is used in the following paper: Dimitrios Kotzias, Misha Denil, Nando de Freitas, Padhraic Smyth: From Group to Individual Labels Using Deep Features. KDD 2015: 597-606

You will perform data mining steps (data preprocesing, classification) on this dataset and write your results in a project report in the form of a IEEE conference paper.

Steps:

a. Literature review: You should read the following paper to learn what has been done before on this problem: https://www.cs.cornell.edu/home/llee/papers/sentiment.pdf. You should write the summary of this work with your own sentences and this summary will be in the "Related Work" section of your paper.

b. Dataset characteristics: Data description, size, training, test, number of attributes, attribute lists, type of attributes, range of attributes, etc. In this dataset, each distinct word should be considered as an attribute/feature.

c. Data preprocessing: Normalization, missing values, outlier detection, smoothing, attribute reduction/attribute selection, sampling etc.

d. Data mining tasks (Classification): Use Weka (preferred) or any other data mining tool. Perform classification experiments using different algorithms including at least decision trees, naïve bayes, rule learning. Performance analysis with measures covered in the lecture. Discuss the results.

Project Paper: Write your project report in the form of a conference paper. (January 22, 2016) Follow the IEEE template in here: https://www.ieee.org/publications_standards/publications/conferences/2014_04_msw_usltr_format.doc.

Your paper should contain the following sections:

1. Abstract: one paragraph summary of your paper

2. Introduction: Describe the sentiment classification problem, why it is important to classify sentiments (give the motivation). Finally mention what are the contributions of your work in this paper.

3. Related work: your should write the summary of the paper in step a). https://www.cs.cornell.edu/home/llee/papers/sentiment.pdf.

4. Sentiment Classification: You should write data mining steps that you performed in steps b, c, and d except the classification results.

5. Experimental Results: you should report classification results with measures covered in the lecture. You should also discuss the results in this section.

6. Conclusion: Briefly summarize the paper and state your opinions about what can be done to improve classification accuracy further.

Reference no: EM13934402

Questions Cloud

Difference between manufacturing and non manufacturing cost : What is the difference between manufacturing and non manufacturing costs?
Risk level equivalent to that of the overall market : Your portfolio has a beta of 1.54. The portfolio consists of 16 percent U.S. Treasury bills, 34 percent stock A, and 50 percent stock B. Stock A has a risk level equivalent to that of the overall market. What is the beta of stock B?
Write about e-cigarettes topic : Write about e-Cigarettes topic and just write Summry and Quote(s). A brief description of the technology and an explanation of the associated science. On E-Cigarettes topic
What are the current carrying costs : Louise Manufacturing uses 2,200 switch assemblies per week and then reorders another 2,200. The relevant carrying cost per switch assembly is $8.50, and the fixed order cost is $1,100. What are the current carrying costs?
Perform data mining steps on the given dataset : You will perform data mining steps (data preprocesing, classification) on this dataset and write your results in a project report in the form of a IEEE conference paper.
Assets-liabilities and equity-total revenue and net income : Select one (1) U.S. publicly traded company and review its most recent Annual Report. Use the Income Statement and Balance Sheet to determine the changes in: assets, liabilities, and equity, total revenue and net income.
What is your expected rate of return on stock : You recently purchased a stock that is expected to earn 25 percent in a booming economy, 14 percent in a normal economy, and lose 5 percent in a recessionary economy. There is a 23 percent probability of a boom, a 62 percent chance of a normal econom..
Find holding-period return for one-year investment period : A newly issued bond pays its coupons once a year. Its coupon rate is 5.3%, its maturity is 20 years, and its yield to maturity is 8.3%. Find the holding-period return for a one-year investment period if the bond is selling at a yield to maturity of 7..
Assume that the risk premium : Grammy phone is a cellular firm that reported a net income of $50 million in the most recent financial year. The firm had $1 billion in debt, on which it reported interest expenses of $100 million in the most recent financial year. Also assume that t..

Reviews

Write a Review

Database Management System Questions & Answers

  Knowledge and data warehousing

Design a dimensional model for analysing Purchases for Adventure Works Cycles and implement it as cubes using SQL Server Analysis Services. The AdventureWorks OLTP sample database is the data source for you BI analysis.

  Design a database schema

Design a Database schema

  Entity-relationship diagram

Create an entity-relationship diagram and design accompanying table layout using sound relational modeling practices and concepts.

  Implement a database of courses and students for a school

Implement a database of courses and students for a school.

  Prepare the e-r diagram for the movie database

Energy in the home, personal energy use and home energy efficiency and Efficient use of ‘waste' heat and renewable heat sources

  Design relation schemas for the entire database

Design relation schemas for the entire database.

  Prepare the relational schema for database

Prepare the relational schema for database

  Data modeling and normalization

Data Modeling and Normalization

  Use cases perform a requirements analysis for the case study

Use Cases Perform a requirements analysis for the Case Study

  Knowledge and data warehousing

Knowledge and Data Warehousing

  Stack and queue data structure

Identify and explain the differences between a stack and a queue data structure

  Practice on topic of normalization

Practice on topic of Normalization

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd