Implement one executable hadoop mapreduce job

Assignment Help Other Subject
Reference no: EM132576905

CST4070 Applied Data Analytics - Tools, Practical Big Data Handling, Cloud Distribution - Middlesex University

Assignment - Big Data

You are required to submit your work via the dedicated Unihub assignment link by the specified deadline. This link will ‘timeout' at the submission deadline. Your work may not be accepted as an email attachment if you miss this deadline. Therefore, you are strongly advised to allow plenty of time to upload your work prior to the deadline.

You are required to solve the tasks illustrated below. Each task should be accompanied by:

A short introduction where you describe the problem and your high level solution. Your step-by-step process supported by screenshots. Each screenshot needs to be accompanied by a short explanatory text.

Eventually, if necessary, conclude each task with brief summary of what you have done.

Your submission needs to be unique

When solving your tasks, you are required to name your files by using your first name (e.g., if your name is Alice, you may name your task 1 file as ) so to make your submission unique. Obviously, also your explanatory text needs to be unique.

Tasks

Follow the lab instructions to install Apache Hadoop into a virtual server running on Linux Ubuntu Server. Once you have Apache Hadoop installed and running, execute the following tasks.

Task 1

Implement one executable Hadoop MapReduce job that counts the total number of words having an even and odd number of characters. As an example, if the text in input is Hello world , the output should be even:0, Odd:2, because both Hello and world contain an odd number of characters. Whereas, if the input us My name is Alice the output should be even:3, Odd:1.

The job needs to be executed by a mapper and a reducer. Both mapper and reducer needs to be written in Python and tested in Linux Ubuntu before running them on Hadoop MapReduce.

Task 2

Implement one executable Hadoop MapReduce job that receives in input a .csv table having the structure 'StudentId, Module, Grade' and returns in output the minimum and maximum grade of each student along as her total number of modules she has passed.

Therefore, if your input is:

StudentId

Module

Grade

S001

Statistic

75

S002

Statistic

72

S001

Big Data

78

S003

Big Data

66

S001

Programming

70

S002

Programming

55

S001

Machine Learning

65

S002

Machine Learning

61

Your output need to be:

StudentId

MinGrade

MaxGrade

Modules

S001

65

78

4

S002

55

72

3

S003

66

66

1

The job needs to be executed by a mapper and a reducer. Both mapper and reducer needs

to be written in Python and tested in Linux Ubuntu before running them on Hadoop MapReduce.

Task 3

Implement one executable Hadoop MapReduce job that receives in input two .csv tables having the structure:

User: UserId, Name, DOB
Follows: UserIdFollower, UserIdFollowing

The MapReduce job needs to perform the following SQL query:

select U.UserId, U.Name as NameFollower, F.Name as NameFollowing from User as U
join Follows as F on U.UserId = F.UserId where F.DOB <= '2002-03-01'

Therefore, if the two original tables are:

UserId

Name

DOB

U001

Alice

2005-01-05

U002

Tom

2001-02-07

U003

John

1998-06-02

U004

Alex

2006-02-01

UserIdFollower

UserIdFollowing

U001

U002

U001

U003

U002

U001

U002

U004

U003

U001

U004

U001

The final table needs to be

UserId

NameFollower

NameFollowing

U001

Alice

Tom

U001

Alice

John

The job needs to be executed by a mapper and a reducer. Both mapper and reducer needs to be written in Python and tested in Linux Ubuntu before running them on Hadoop MapReduce.

Attachment:- Applied Data Analytics.rar

Reference no: EM132576905

Questions Cloud

Write Essay on Ban on Child Labour : Assignment - Write Essay on Ban on Child Labour. MLA Format and Word Limit: not more than 600 words
Write your thoughts regarding juror use of electronic tools : Write your thoughts regarding juror use of electronic tools during a case. What is a problem with jurors using these electronic tools during a case?
Annual total inventory management costs : What is the annual total inventory management costs of cheese inventory.
How to prepare adjustment journal entries required : How to Prepare the adjustment journal entries required to eliminate the intra-group transactions in the consolidation worksheet of Fauci Ltd Group at 30 June
Implement one executable hadoop mapreduce job : Describe the problem and your high level solution. Your step-by-step process supported by screenshots. Each screenshot needs to be accompanied
What is the correcting entry to be made : What is the correcting entry to be made in 2010? On january 1, 2009, Steve Co. Acquired 20,000 shares of Bailey Co. For a total cost of 200000.
Assignment - Essay for the Application Process : Pick there topics that you consider to be particularly important for society and explain why you think it would be helpful to study these topics
Financial acumen-keeping abreast of the financial measures : Keeping abreast of the financial measures and metrics employed by a company allows employees to better understand its health and position at any given time.
Create the adjustment journal entries required to eliminate : Create the adjustment journal entries required to eliminate the intra-group transactions in the consolidation worksheet of Fauci Ltd Group at 30 June 2020.

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd