Create a program in python that analyzes twitter information

Assignment Help Python Programming
Reference no: EM132356971

Learning Outcome:

By completing this assignment, you will gain skills relating to using functions,

complex data structures,

nested loops,

text processing,

file input and output,

exceptions in Python.

Task:

In this assignment, you will write a complete program in Python that analyzes twitter information. With the emergence of Internet companies such as Google, Facebook and Twitter, more and more data accessible online is comprised of text. Textual data and the computational means of processing it and extracting information is also increasingly more important in areas such as business, humanities,
social sciences, etc.

In this assignment, you will deal with textual analysis. Twitter has become very popular, with many people "tweeting" aspects of their daily lives. This "flow of tweets" has recently become a way to study or guess how people feel about various aspects of the world or their own life.

For example, analysis of tweets has been used to try to determine how certain geographical regions may be voting - this is done by analyzing the content, the words and phrases, in tweets. Similarly, analysis of key words or phrases in tweets can be used to determine how popular or unpopular a movie might be. This is often referred to as sentiment analysis.

In this assignment, you will build a program that will perform simple sentiment analysis on Twitter data. The Twitter data contains comments from individuals about how they feel about their lives and comes from individuals across the continental United States. The objective is to determine which timezone (Eastern, Central, Mountain, Pacific; see below for more information on how to do this) is the "happiest". To do this, your program will need to:

Analyze each individual tweet to determine a score - a "happiness score".

The "happiness score" for a single tweet is found by looking for certain keywords (given) in a tweet and for each keyword found in that tweet totaling their "sentiment values". In this assignment, each value is an integer from 1 to 10. The score for the tweet is simply the total of the "sentiment values" divided by the number of keywords found. If there are none of the given keywords in a tweet, it is just ignored.

The "happiness score" for a timezone is just the total of the scores for all the tweets in that region divided by the number of tweets.

A file called tweets.txt contains the tweets and a file called keywords.txt contains keywords and scores for determining the "sentiment" of an individual tweet. These files are described in  more detail below.

The file tweets.txt contains the tweets; one per line (some lines are quite long). The format of a tweet is:
[lat, long] value date time text
where:

[lat, long] - the latitude and longitude of where the tweet originated. You will need these values to determine the timezone in which the tweet originated.

value - not used; this can be skipped.

date - the date of the tweet; not used, this can be skipped.

time - the time of day that the tweet was sent; not used this can be skipped.

text - the text in the tweet.

The file keywords.txt contains sentiment keywords and their "happiness scores"; one per line.

The format of a line is:

keyword, value

where:

keyword - the keyword to look for.

value - the value of the keyword; values are limited to 1, 5 and 10, where 1 represents very "unhappy" and 10 represents "very happy".

(you are free to explore different sets of keywords and values at your leisure for the sheer fun of it!).

Determining timezones across the continental United States

Given a latitude and longitude the task of determining exactly the location that it corresponds to can be very challenging given the geographical boundaries of the United States. For this assignment, we simply approximate the regions corresponding to the timezones by rectangular areas defined by latitude and longitude points. Our approximation looks like:

So the Eastern timezone, for example, is defined by latitude-longitude points p1, p2, p3 and p4. To determine the origin of a tweet, then, one simply has to determine in which region the latitude and longitude of the tweet belongs. The values of the points are:

p1 = (49.189787, -67.444574)
p2 = (24.660845, -67.444574)
p3 = (49.189787, -87.518395)
p4 = (24.660845, -87.518395)
p5 = (49.189787, -101.998892)
p6 = (24.660845, -101.998892)
p7 = (49.189787, -115.236428)
p8 = (24.660845, -115.236428)
p9 = (49.189787, -125.242264)
p10 = (24.660845, -125.242264)

Functional Specifications:

Part A: Developing code for the processing of the tweets and sentiment analysis.

1. Your program should prompt the user for the name of the file containing the keywords. It should check to make sure that the file exists and if does not exist your program should generate an exception and exit.

2. Your program should then input the keywords and their "happiness values" and store them in a data structure in your program (the data structure is of your choice, but you might consider a list).

3. Your program should then prompt the user for the name of the file with tweets; again, your program should generate an exception if the file does not exist. Suggestion: you might wish to have a small file of tweets, just a portion of the full file, for the testing of your program.

4. You should then process the file of tweets, computing the "happiness score" for each tweet and computing the "happiness score" for each timezone. You will need to read the file of tweets line by line as text and break it apart. The string processing functions in Python (see Chapter 7) are very useful for doing this. Your code should not duplicate code. Determine places that code can be reused and create functions. Your program should ignore tweets with no keywords and also ignore tweets from outside the time zones.

5. Once you have completed processing the entire file, you should print out:

The "happiness score" for each timezone.

The number of tweets found in that timezone.

Part B (mostly for fun):

Once you have implemented your sentiment analysis program, you can now extend the program to include a simple (but fun) graphical display. This part of the assignment will make use of the given Python code in graphics.py and happy_histogram.py; these can be found on the course web site for Assignment #3.

The Python code graphics.py is the graphics package described in the book; you can look in the text for examples (see Chapter 2, section 2.6).

The Python code happy_histogram.py does simple graph plotting using graphics.py. You will need to import happy_histogram.py into your program. It makes a number of assumptions, so make sure to read the comments in the code about the parameters and limitations of the
functions.

It contains three functions:

- drawSimpleHistogram which will create a histogram of four values in a graphics window; it assumes that the values are in the range of 0-10.

- drawHappyFace which draws a happy face on the histogram as determined by drawSimpleHistogram.

- drawSadFace which draws a sad face on the histogram as determined by drawSimpleHistogram.

Additional Information

For both files it is advised that when you read in the files you the line below to avoid encoding errors.

open("fileName.txt","r",encoding="utf-8") or open('fileName.txt', encoding='utf-8', errors='ignore')

Non-functional Specifications:

1. Include brief comments in your code identifying yourself, describing the program, and describing key portions of the code.

2. Assignments are to be done individually and must be your own work. Software may be used to detect cheating.

3. Use Python coding conventions and good programming techniques, for example:

Meaningful variable names

Conventions for naming variables and constants

Use of constants where appropriate

Readability: indentation, white space, consistency

The name of the file you submit should be your UWO userid_Assign3.py. For instance, my assignment would be oola_Assign3.py. Make sure you attach your python file to your assignment; DO NOT put the code inline in the textbox.

Make sure that you develop your code with Python 3.5 as the interpreter. TAs will not endeavor to fix code that uses earlier versions of Python.

Reference no: EM132356971

Questions Cloud

What is the probability that an 18-to 24 year : What is the probability that an 18-to 24 year old spends less than 112 minutes watching video on his or her smartphone per month?
Make your own algorithmic drawing with whatever combination : Make your own algorithmic drawing with whatever combination of color, line, and sequence you choose. Drawings should be uniquely yours.
Organization operate effectively to protect : you are the owner of an IT Consulting Business. You have been contracted to help an organization operate effectively to protect their client's information
Turn research questions into testable hypotheses : HPS301 - Research Methods in Psychology - Deakin University - Write your own participants section nd the rest can be ‘cut-and-pasted' from the ‘Method' document
Create a program in python that analyzes twitter information : CS1026: Write a complete program in Python that analyzes twitter information. With the emergence of Internet companies such as Google, Facebook and Twitter.
Willie primary source of power : Willie's primary source of power is based on his ability to influence others with something of value to them and, in some cases
Identify methods for evaluating operational processes : HA499- Identify methods for evaluating operational processes. Analyze healthcare organization quality improvement methods.
How art contributes to happiness-reduction of stress : Identify how art contributes to happiness, reduction of stress, and increased morale in the workplace while synthesizing these elements within personal life
Two types of health economics are positive and normative : The three types of efficiencies are allocative, production, and technical. The two types of health economics are positive and normative.

Reviews

Write a Review

Python Programming Questions & Answers

  Write a python program to implement the diff command

Without using the system() function to call any bash commands, write a python program that will implement a simple version of the diff command.

  Write a program for checking a circle

Write a program for checking a circle program must either print "is a circle: YES" or "is a circle: NO", appropriately.

  Prepare a python program

Prepare a Python program which evaluates how many stuck numbers there are in a range of integers. The range will be input as two command-line arguments.

  Python atm program to enter account number

Write a simple Python ATM program. Ask user to enter their account number, and print their initail balance. (Just make one up). Ask them if they wish to make deposit or withdrawal.

  Python function to calculate two roots

Write a Python function main() to calculate two roots. You must input a,b and c from keyboard, and then print two roots. Suppose the discriminant D= b2-4ac is positive.

  Design program that asks user to enter amount in python

IN Python Design a program that asks the user to enter the amount that he or she has budget in a month. A loop should then prompt the user to enter his or her expenses for the month.

  Write python program which imports three dictionaries

Write a Python program called hours.py which imports three dictionaries, and uses the data in them to calculate how many hours each person has spent in the lab.

  Write python program to create factors of numbers

Write down a python program which takes two numbers and creates the factors of both numbers and displays the greatest common factor.

  Email spam filter

Analyze the emails and predict whether the mail is a spam or not a spam - Create a training file and copy the text of several mails and spams in to it And create a test set identical to the training set but with different examples.

  Improve the readability and structural design of the code

Improve the readability and structural design of the code by improving the function names, variables, and loops, as well as whitespace. Move functions close to related functions or blocks of code related to your organised code.

  Create a simple and responsive gui

Please use primarily PHP or Python to solve the exercise and create a simple and responsive GUI, using HTML, CSS and JavaScript.Do not use a database.

  The program is to print the time

The program is to print the time in seconds that the iterative version takes, the time in seconds that the recursive version takes, and the difference between the times.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd