Reference no: EM13934402
Data Mining Project
In this project you will use the sentiment labelled sentences dataset provided in the following link: https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences This dataset contains review sentences labeled (classified) as positive and negative such as the following two sentences from imdb movie reviews:
Wasted two hours. 0
Saw the movie today and thought it was a good effort, good messages for kids. 1
If the sentence is labeled as 0, it means a negative comment, if it is labeled as 1 it means a positive comment. There are 3 different files (imdb_labelled.txt, amazon_cells_labelled.txt, yelp_labelled.txt) each containing 500 positive and 500 negative sentences. (amazon and yelp datasets contain more number of instances but the ones labeled as 0 or 1 should be considered only). This data is used in the following paper: Dimitrios Kotzias, Misha Denil, Nando de Freitas, Padhraic Smyth: From Group to Individual Labels Using Deep Features. KDD 2015: 597-606
You will perform data mining steps (data preprocesing, classification) on this dataset and write your results in a project report in the form of a IEEE conference paper.
Steps:
a. Literature review: You should read the following paper to learn what has been done before on this problem: https://www.cs.cornell.edu/home/llee/papers/sentiment.pdf. You should write the summary of this work with your own sentences and this summary will be in the "Related Work" section of your paper.
b. Dataset characteristics: Data description, size, training, test, number of attributes, attribute lists, type of attributes, range of attributes, etc. In this dataset, each distinct word should be considered as an attribute/feature.
c. Data preprocessing: Normalization, missing values, outlier detection, smoothing, attribute reduction/attribute selection, sampling etc.
d. Data mining tasks (Classification): Use Weka (preferred) or any other data mining tool. Perform classification experiments using different algorithms including at least decision trees, naïve bayes, rule learning. Performance analysis with measures covered in the lecture. Discuss the results.
Project Paper: Write your project report in the form of a conference paper. (January 22, 2016) Follow the IEEE template in here: https://www.ieee.org/publications_standards/publications/conferences/2014_04_msw_usltr_format.doc.
Your paper should contain the following sections:
1. Abstract: one paragraph summary of your paper
2. Introduction: Describe the sentiment classification problem, why it is important to classify sentiments (give the motivation). Finally mention what are the contributions of your work in this paper.
3. Related work: your should write the summary of the paper in step a). https://www.cs.cornell.edu/home/llee/papers/sentiment.pdf.
4. Sentiment Classification: You should write data mining steps that you performed in steps b, c, and d except the classification results.
5. Experimental Results: you should report classification results with measures covered in the lecture. You should also discuss the results in this section.
6. Conclusion: Briefly summarize the paper and state your opinions about what can be done to improve classification accuracy further.
Difference between manufacturing and non manufacturing cost
: What is the difference between manufacturing and non manufacturing costs?
|
Risk level equivalent to that of the overall market
: Your portfolio has a beta of 1.54. The portfolio consists of 16 percent U.S. Treasury bills, 34 percent stock A, and 50 percent stock B. Stock A has a risk level equivalent to that of the overall market. What is the beta of stock B?
|
Write about e-cigarettes topic
: Write about e-Cigarettes topic and just write Summry and Quote(s). A brief description of the technology and an explanation of the associated science. On E-Cigarettes topic
|
What are the current carrying costs
: Louise Manufacturing uses 2,200 switch assemblies per week and then reorders another 2,200. The relevant carrying cost per switch assembly is $8.50, and the fixed order cost is $1,100. What are the current carrying costs?
|
Perform data mining steps on the given dataset
: You will perform data mining steps (data preprocesing, classification) on this dataset and write your results in a project report in the form of a IEEE conference paper.
|
Assets-liabilities and equity-total revenue and net income
: Select one (1) U.S. publicly traded company and review its most recent Annual Report. Use the Income Statement and Balance Sheet to determine the changes in: assets, liabilities, and equity, total revenue and net income.
|
What is your expected rate of return on stock
: You recently purchased a stock that is expected to earn 25 percent in a booming economy, 14 percent in a normal economy, and lose 5 percent in a recessionary economy. There is a 23 percent probability of a boom, a 62 percent chance of a normal econom..
|
Find holding-period return for one-year investment period
: A newly issued bond pays its coupons once a year. Its coupon rate is 5.3%, its maturity is 20 years, and its yield to maturity is 8.3%. Find the holding-period return for a one-year investment period if the bond is selling at a yield to maturity of 7..
|
Assume that the risk premium
: Grammy phone is a cellular firm that reported a net income of $50 million in the most recent financial year. The firm had $1 billion in debt, on which it reported interest expenses of $100 million in the most recent financial year. Also assume that t..
|