Data warehousing and big data assignment

Assignment Help Other Subject
Reference no: EM133164790

BUS5WB Data Warehousing and Big Data Assignment

The third assignment focuses on Big Data analytics on unstructured text data using Microsoft Azure. You are required to derive insights by applying big data distributed processing and machine learning techniques.

Dataset 1 - Amazon Reviews

The dataset contains ~10000 reviews of Amazon products. The fields are;

What you are required to do

1 HD Insight to Analyse Reviews
Develop an aggregate of these reviews using your knowledge of Hadoop and MapReduce in Microsoft HDInsight.

a) Follow the same approach as the Big Data Analytics Workshop (using the wordcount method in HDInsight) to determine the contributory words for each level of rating.
b) Present the workflow of using HDInsight (you may use screen captures) along with a summary of findings and any insights for each level of rating. MapReduce documentation for HDInsight is available here.

You may either create your own Hadoop Cluster or make use of the one provided to run your analysis. The details of the cluster will be provided on the LMS under the section for Assignment 3.

2 Azure Databricks for Big Data Processing
Use the period of data allocated (it will be a single year) to you on the New York City Taxi & Limousine Commission dataset on Azure Databrick to answer the questions below;

a) Plot a visual to show by month for the total fare amount generated by taxi trips with 4 or less passengers have been paid for by credit card. (You will have 12 records)

b) Plot a visual to show the average cost per mile of a taxi ride in each month of the year assigned to you that travelled more than 5 miles, but less than 20 miles grouped by whether the trip was to the airport. (You will have 24 records)
c) Plot a visual to show the day of the week the average number of taxi trips with a single passenger? (You will have 7 records)
d) What are top 10 most profitable routes (in terms of source and destination) for a taxi? (You will have 10 records)

For each of the questions above provide;
• A screenshot of the visual
• A table of the values
• The code that you used to generate it
You will make use of the Azure Databrick cluster which is allocated to you. The details of the cluster will be provided on the LMS under the section Assignment 3. The year allocated to you for analysis will also be shared with you on the LMS.

3 Azure Machine Learning for Prediction

Based on the year assigned to you in the New York City Taxi Dataset (as given in question 2 above) use Azure ML Studio to build a model that predicts the total ride duration of taxi trips in New York City.

Provide the following:
a) A screen capture of the completed model diagram and any decision you made in training the model. For example, rationale for some of the components used, how many records have been used for training and how many for testing.
b) A set of metrics which presents how effective your model is.
c) Which features were most influential in driving your model?
d) Using your model predict the total trip duration for trips given below.

You will make use of the Azure Machine Learning Studio that has been allocated to you. Information regarding accessing the application can be found in the LMS under the section Assignment 3.

The datasets which are required for training and testing are available in Azure Machine Learning Studio further information has been provided in the LMS under section Assignment 3.

Reference no: EM133164790

Questions Cloud

Explain the meaning of the phrase contract out : Explain the meaning of the phrase "contract out" of bargaining unit work and provide reasons why unions seek restrictions on contracting out.
Differences between dismissal and discharge : Q1: Why do experts strongly recommend that performance problems be considered separately from conduct or behavior problems?
What must the monthly income be for the year : Holly Meadows Golf Course is for sale. See attached summary. To achieve a capitalization rate of 9%, what must the monthly income be for the year
How interest-based bargaining can be used to resolve : How interest-based bargaining can be used to resolve difficult disputes and provide examples
Data warehousing and big data assignment : Big Data analytics on unstructured text data using Microsoft Azure. You are required to derive insights by applying big data distributed processing
Handling customer issues such as product installation issues : You are employed full time for a local organization called Halleck, Inc. as a customer service representative. Halleck, Inc. is a call center which specializes
Importance of a formal economic development strategy : Explain the importance of a formal economic development strategy for a municipality, compared to a strategy of responsiveness to individual requests of communit
Identify a canadian leader who had moral and ethical issues : Identify a Canadian leader who had moral and ethical issues. Identify his/her moral and ethical issues. This could be someone in politics, in your work, communi
Explain why equal treatment of stakeholders is not essential : Explain why equal treatment of stakeholders is not essential and why this is appropriate. What criteria should be used to determine which stakeholders are given

Reviews

Write a Review

Other Subject Questions & Answers

  Pavement and storm water management facilities

Identify what type of critical infrastructure data collection is needed for pavement and storm water management facilities.

  Explain how each ethical obligation is overcome

Explain how each ethical obligation is overcome and/or mitigated by psychologists by providing specific examples of policies, programs, laws, or regulations.

  The price of omni shares after they had sold their stock

How much - if anything - will Analinda be required to pay to Omni in connection with her sale of Omni stock on 12 April 2013?

  How do you define critical thinking

Critical Thinking - After reading the required resources for this week and participating in the discussion, how do you define critical thinking

  How statistical data is used in your organization

Discuss why it is important for a person working in health care to understand statistical concepts. Provide an example of how statistical data is used in your.

  How will knowledge affect the way you currently work

How will this knowledge affect the way you currently work with others in your present role? (minimum of one paragraph). How will this knowledge impact the way.

  Describe the organization or service for the assignment

Strengths, weaknesses, opportunities, and threats (SWOT) are critical components of a marketing plan. For this assignment, you will build a marketing plan.

  Create a Microsoft Excel file with four worksheets

Excel Project - MS Excel. Create a Microsoft Excel file with four worksheets that provides extensive use of Excel capabilities for charting

  Special sequence of nucleotides in dna

What is referred to as a special sequence of nucleotides in DNA that marks the end of a gene. It signals RNA polymerase to release the newly made RNA molecule and then to depart from the gene?

  How does the practice of hijab impact muslim women

How does the practice of Hijab impact Muslim women in the workplace

  Describe how major sociological perspectives

Describe how major sociological perspectives (functionalism, conflict theory) would analyze gender.

  How an organizations culture affects performance

Write a 3+ page, double-spaced, paper exploring how an organization's culture affects performance. Discuss individualism vs. collectivism.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd