Compare two different measures of computing sentiment

Assignment Help Other Engineering
Reference no: EM131687420

Introduction -

In this assignment, you will download annual reports for Apple, Inc. from EDGAR and analyse sentiment. In Task 1 you will download and structure the annual reports. In the remaining tasks you will analyse sentiment measures. The main goal is to critically understand what parameters and choices influence the level of computed sentiment.

Except Task 1, all tasks use the data gathered. In case you are unable to complete the initial data gathering and cleaning tasks, you can use the data we provide, apple_10k.Rdata. The file contains a vector with each element being an individual annual report without having applied any cleaning task. The names of the vector indicate the submission date. In this assignment, there are several steps that can yield different results depending on the actual coding. Hence, as long as you fullfill all steps, it does not matter if your downloaded data is not exactly the same as the one provided by us.

Task 1: Download annual reports

Download all annual reports for Apple, Inc. from 1994 to 2016 using EDGAR. Save the raw teat file without any cleaning.

Note: If you download an annual report (10-K) from EDGAR, the obtained text file includes several additional filings other than the annual report. Make sure that 3ou only use the annual report for later analysis. Hint: use the structure of the downloaded file to identify the annual report.

Task 2: Sentiment and document cleaning

In this task, you evaluate the impact of document cleaning tasks on sentiment computation. We only focus on the Harvard-IV and Loughran-MacDonald dictionary. Using the SentimentAnalysis package, this refers to sentimentGI and sentimentLM. Please use all documents in the whole corpus for this task.

Compute sentiment for each individual annual report using...

  • the raw text files.
  • the text files after removing html tags.
  • the text files after removing html tags, excess white space, numbers, punctuation, removing words with less than 4 and more than 30 characters, and removing stopwords.

Make two time series plots:

  • Plot a time-series of the three sentimentGI measures computed in the previous step.
  • Plot a time-series of the three sentimentLM measures computed in the previous step.

Why does the cleaning measures impact the level of sentiment?

What explains the one visible drop in one sentence measure?

Task 3: Terms defining sentiment

In this task, you evaluate the terms defining the individual sentiment measures. Please use all documents in the whole corpus for this task with html tags removed and cleaning steps of your choice. Answer following questions:

  • What are the 25 most common positive terms based on the Harvard-IV and Loughran-MacDonald dictionary using the whole corpus?
  • How many of them are present in the other dictionary? (Harvard-IV versus Loughran-MacDonald)
  • Interpret your findings.

Task 4: Comparison between different sentiment methods - advanced

In this task, you will compare two different measures of computing sentiment. In particular, you will evaluate if the correlation between both measures is different for sentences with high or low sentiment. We only focus on the Harvard-IV and Loughran-MacDonald dictionary. Using the SentimentAnalysis package, this refers to sentimentGI and sentimentIA Furthermore, we are only considering the most recent annual report by Apple, Inc.

Split the annual report in sentences. Exclude sentences of unreasonably short and long length. Compute sentiment for each sentence. Compute following tasks:

Normalize each sentiment measures accross documents by deducting the mean and deviding by the standard deviation. (x - mean(r))/std(x). After this step, each sentiment measure has mean zero and standard deviation one and hence they are comparable in magnitude and variation. Use the standardised version for later computations.

Sentiment and sentence length:

  • Split all sentences in five bins based on sentence length. Set the breakpoints at the quintiles so that each bin includes a fifth of the data.
  • For each bin, compute the average sentiment124 and sentimentGI, and visualize.
  • For each bin, compute the correlation between sentimentCI and sentlmentLM. Visualize with a plot.

Sentiment and its level:

  • Split the sentences in six bins based on sentlmentGI falling in one of the following intervals: (- inf : -0.51/(-0.5 : 01/(0 : 0.51/(0.5 : 1.51/(1.5 : inf). How many observations are in each individual bin?
  • For each bin, compute the average sentimentLM, and visualize.
  • For each bin, compute the correlation between sentimentGI and sentlmentLM. Visualize with a plot.

Interpret your findings.

Assignment Files -

https://www.dropbox.com/s/zbvl4uulp6nudtl/Assignment%20Files%20-%20Task%203%20and%20Task%204.rar?dl=0

Reference no: EM131687420

Questions Cloud

What do you think is going on globally : The International Monetary Fund has said the global economy's recent recovery may not last, despite a pickup in activity in all western countries except the UK.
Define social media and how it impacts our culture : What is a good research paper topic. I was thinking about doing it on Social media and how it impacts our culture today
Explain two political strategies use in the meetings : What danger, if any, is there for Sally Jones in Bob Black's proposal? Explain two political strategies that you believe Sally could use in the upcoming.
Discuss the nurse cannot make more staff magically appear : The nurse cannot make more staff magically appear. What could the nurse have done, under these circumstances
Compare two different measures of computing sentiment : Task 4: Comparison between different sentiment methods - advanced. In this task, you will compare two different measures of computing sentiment
What is economic phenomenon called : a. Describe a situation that may cause a market to produce an output that is less than the optimal amount.
Review case study of the building power as the new nurse : You have been an RN for 3 years. Six months ago, you left your position as a day charge nurse at one of the local hospitals to accept a position at the public.
Define grow personally or professionally : what opportunities do you pursue to help you grow personally or professionally
Examine impact of such changes on demand for different goods : Examine the impact of such changes on the demand for different goods. How household patterns in United States are anticipated to change over the next few years.

Reviews

len1687420

10/23/2017 5:18:00 AM

I have some updates. I have done task 1 and task 2, so I won’t need any help on those two. I do however need help with task 3 and task 4. For task 3, you need to use the file that I send called dtm, and for task 4, you need to use the file that I send called apple_10ks.Rdata. (Do not use the previous file called apple_10k that I sent in the rar.folder). Considering I only need task 3 and task 4 now remember as I said in the last mail, that the file dtm is for task 3, and apple_10ks file is for task 4. Also, here is the coding for task 1-2 in R if its needed.

len1687420

10/23/2017 5:17:54 AM

Note: If you download an annual report (10-K) from EDGAR, the obtained text file includes several additional filings other than the annual report. Make sure that 3ou only use the annual report for later analysis. Hint: use the structure of the downloaded file to identify the annual report. Note: for computation of the sentiment measure you do not need to use a for-loop, but just supply the sentiment function with a vector of text (or a document-term-matrix), e.g. amalyzeSent lake= (text.vector).

Write a Review

Other Engineering Questions & Answers

  Calculate the bottom hole pressure for the injection

Calculate oil rates of two wells in an under saturated oil reservoir using generalized Vogels equation and calculate the bottom hole pressure for the injection

  The north american court system related problem

What would be your ideal situation if you are the prosecuting attorney?

  What was the soil resilient modulus of the subgrade

The reliability was 70%, overall standard deviation was 0.5, ΔPSI was 2.0 (with a TSI of 2.5), and all drainage coefficients were 1.0. What was the soil resilient modulus of the subgrade used in design?

  What does one measure with an ise

Does one need to calibrate an Ion Selective Electrode (ISE)? Why and what does one measure with an ISE?

  Instrumentation measurement & lab

The team project encompasses week 7 and week 8 assignments. You will begin working on this project as a team in week 7 and continue to week 8. You will turn in your final project in week 8.

  Find the gmr of a stranded conductor

Find the GMR of a stranded conductor consisting of six outer strands surrounding and touching one central strand, all strands having the same radius r

  Discuss what is meant by nonlinear filtering

Discuss what is meant by "nonlinear filtering". How is different from linear filtering? Discuss the between continuous-time and discrete signals and their analysis imposed.

  How many toys should retailer send to high-service channel

How many toys should the retailer send to the high-service channel and how many swimsuits should it purchase at the beginning of the season?

  Determine the frp area and the bending strength

Determine the FRP area and the bending strength corresponding to the balanced strengthening configuration, for the section reported in given Exercise.

  Complete an assembly design

Design a fancy intricate part or assembly of your choice that utilizes the advanced features of the software. Part 2 will be graded on originality, complexity and the level of use of Solidworks for more advanced modeling functions.

  Explain the operation of an encoder

Explain the operation of an encoder.

  Design adder circuit to add two two bit numbers with a carry

Design an adder circuit to add two 2 bit numbers with a carry in bit and a carry out bit. The design must use combinatorial logic only. No decoders or multiplexers. Break you design down into simpler tasks.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd