Reference no: EM132917315
Final Assignment
Part 1:
Step 1: Read the Tripadvisor hotel reviews dataset
Step 2: Create a diagram to take a look at the variable "Score" to see if majority of the customer ratings are positive or negative.
Step 3: Create wordclouds to see the most frequently used words in the reviews and save it.
Step 4: Do Sentiment analysis with VADER
• Applying the model on our dataset
• Assign reviews with compound > 0 as positive sentiment, compound < 0 negative sentiment and remove score = 0
• export csv files
• Now that we have classified reviews into positive and negative, let's build wordclouds for each!
• Take a look at the distribution of reviews with sentiment across the dataset and save the diagram
Step 5: Building the classification model
Build the sentiment analysis model! This model will take reviews in as input.
It will then come up with a prediction on whether the review is positive or negative.
This is a classification task, so you will train a simple logistic regression model to do it.
Step 6: Split the Dataframe
The new data frame should only have two columns - "Review", and "sentiment" (the target variable).
Training the sentiment analysis model
80% of the data will be used for training, and 20% will be used for testing.
Step 7: Create a bag of words
Use a count vectorizer from the Scikit-learn library.
Convert the text into a bag-of-words model since the logistic regression algorithm cannot understand text.
Step 8: Logistic Regression
Split target and independent variables Fit model on data
Make predictions:
Step 9: Test the accuracy of your model Find accuracy, precision, recall
Create the classification report
Part 2: Topic Modelling
LDA
Step 1: Import the positive.csv dataset you have created in Part 1 Step 2: Applying LDA on the "Review" column
Step 3: Define number of topics as 5
Step 4: Create topics along with the probability distribution for each word in our vocabulary for each topic.
Step 5: Print the 10 words with highest probabilities for all the five topics
Step 6: Add a column to the original data frame that will store the topic for the reviews.
Step 7: Save the new dataset as: reviews_topic(lda).csv
Non-Negative Matrix Factorization (NMF)
Step 1: Import the positive.csv dataset you have created in Part 1
Step 2: Apply Non-Negative Matrix Factorization (NMF) on the dataset Step 3: Define number of topics as 5
Step 4: Create topics along with the probability distribution for each word in our vocabulary for each topic.
Step 5: Print the 10 words with highest probabilities for all the five topics
Step 6: Add a column to the original data frame that will store the topic for the reviews.
Step 7: Save the new dataset as: reviews_topic(nmf).csv
Attachment:- Reviews Assignment.rar
What is the present value of the contract
: Next five years, plus an additional $100,000 at the end of year 6. If the appropriate discount rate is 7%, what is the present value of this contract?
|
How much will susan have to invest today
: Time of her retirement in 30 years by making a single investment today. If the investment can earn 5% annually, how much will Susan have to invest today?
|
What is the country opportunity cost of producing phones
: What is the country opportunity cost of producing phones in terms of laptops? For a given unit of labor, a country can produce either 578 laptop.
|
Make an arbitrage profit of
: Suppose you observe that 1 EUR = $1.44, 1 BP = $1.60, and 1 EUR= 0.92 BP. if you have access to a 1,000,000 credit line, you could make an arbitrage profit of
|
Create topics along with the probability distribution
: Create topics along with the probability distribution for each word in our vocabulary for each topic and Import the positive.csv dataset you have created
|
What is the price of the bill today
: The face value of the bill is $100,000. If the current market yield on this bill is 3% per annum, what is the price of the bill today?
|
Experience and familiarity with company
: Based on your experience and familiarity with the company, which business-level strategy do you believe the firm is trying to implement?
|
Non-negative matrix factorization
: Create a bag of words - Convert the text into a bag-of-words model since the logistic regression algorithm cannot understand text.
|
Calculate the initial investment and terminal cashflow
: Nufarm Ltd, Calculate the initial investment and terminal cashflow relating to capital expenditure and working capital of this project.
|