Implement a mapreduce application to perform

Assignment Help Other Subject
Reference no: EM133265450

Big Data Analytics

Prepare a 10-15 mins presentation to their class in the last couple of weeks of classes. Each group must pick a topic from one of the three categories below.

Topics - Categories:

Category 1: Implement a MapReduce application to perform one of the following:
a) Matrix Multiplication.
b) Relational algebra Selection and Projection (set-based ‘no duplicates' version, and bag- based ‘with duplicates' version)
c) Relational algebra Union, Intersection, and Difference (set- & bag-based if applicable)
d) Relational algebra Natural Join operation

Notes:
1. Please refer to chapter 2 from Mining Massive Datasets that can be freely accessed on the book website, which outlines the necessary processing by mappers and reducers to perform each of the operations above.

2. Don't make any assumption about the number of input files or their filenames. The entries from both matrices could appear in any order in the file(s). Of course, this requires storing additional information in the data files such as the matrix name, and the indices of each entry in addition to the values of the entries. For example:
Let A and B be two matrices, given below, and we would like to find their multiplication C = AB. A is a 2x2 matrix while B is a 2x1 matrix (vector)

       Matrix A                                           Matrix B

0

1

 

 

0

0    25

9

 

0

44

1    31

17

 

1

13

The entries from both matrices can be stored in one, or more files. The data files can show entries from either matrix in any order. Each line represent one entry from either matrix. For example, one possible content of the input data files:
A, 0, 0, 25
B, 1, 0, 13
A, 1, 1, 17
A, 0, 1, 9
A, 1, 0, 31
B, 0, 0, 44

3. The note above also holds for relations in relational algebra (tables in SQL). For example, for natural join, rows/tuples from operand tables can appear in the same file or in different files, in one, two, or more files. Of course this requires the table name to be stored in the data files.
4. The shape of the input matrices and the schema of the input relations (tables) to the mappers and reducers must be passed as additional input upon job submission. Please

consider using the job object to pass this additional input to the mappers and reducers, using:
job.getConfiguration().set() // in the driver code
context.getConfiguration().get() // in the setup() method of the mapper and reducer

? Category 2: Hadoop ecosystem: Kafka, Flume, HBase, Storm, etc.
? Category 3: Other big data solutions: Snowflake, Elasticsearch, Amazon Redshift, etc.

Instructions:

1. Each groups must pick a project from one of the three categories above: MapReduce application, a tool from Hadoop ecosystem, or a big data platform.

2. If you choose MapReduce:
(a) You have to submit the sourcecode files as well.
(b) In your talk you have to cover the code, how it works, and do a sample run in the front of the class. Please prepare the necessary input data files to test your code and confirm if it generates the correct output.

3. If you choose a tool/framework from the Hadoop ecosystem, you have to cover:
(a) The main components/daemons of the tool, what exactly needs to be running to use it.
(b) What it is used for? What kind or processing? Alternative tools that serve the same purpose if any.
(c) A practical simple example on how we use the tool: code, scripting language, commands, etc.
(d) Your presentation should be enough for anyone to know the basics of the tool and start using it for simple processing.

4. If you choose a big data platform:
(a) Same as above: a, b, c, and d.

5. Scoring:
? 50 points: overall quality of slides, presentation, and talk,
? 50 points: code and demo, system components (if applicable), daemons, MapReduce code and how it works, etc.

6. At least two group members should give the presentation, using the same laptop/machine. Hopefully we can squeeze each talk between 10 to 15 mins, to give time for all of the groups.

7. Expected time: each group should have the code (if any) and slides ready within 2 to 3 weeks.

8. Final note: please don't worry much about your score and focus on exploring and learning something new. It should be an exciting experience for the whole class including myself.

Reference no: EM133265450

Questions Cloud

Potentially impacted for aboriginal : Explain three ways the service delivery could be potentially impacted for Aboriginal and/or Torres Strait Islander clients
Smoke coming from my neighbor car warrant : What is the warrant? There is smoke coming from my neighbor's car Warrant?
American airlines is losing money : United Airlines is losing money. American Airlines is losing money. Even Delta Airlines is losing money.
Critically evaluate their ability to manage a project : BMSW5104 Managing Projects in the Organisation - Leadership and Management Skills in the Workplace - Plan the implementation of a project of appropriate complex
Implement a mapreduce application to perform : DSA 5620 Big Data Analytics, University of Central Missouri What it is used for? What kind or processing? Alternative tools that serve the same purpose if any.
Systems development and systems development life cycle : Explain the difference between systems development and the systems development life cycle (SDLC)?
About fatal police shootings of unarmed black people : How do you personal experiences or beliefs affect how you evaluate news stories about fatal police shootings of unarmed black people
Why is it important to continue to monitor anthea plan : Why is it important to continue to monitor Anthea's plan when you work with her? Discuss possible outcomes when client targets are set too high or too low?
Susceptible than millenials to believing fake news : Discuss the different reasons for why younger and older people are more susceptible than millenials to believing fake news,

Reviews

Write a Review

Other Subject Questions & Answers

  Consequences for the individual and society

Discuss the issue of substance addiction in terms of consequences for the individual and society.

  Major differences in doctrine and practice

Discuss the major differences between the Buddhism of China and the Buddhism of Tibet. Include a discussion of a) the difference in each society’s understanding of itself and its culture,

  Why stakeholders have radically different worldviews

Why Stakeholders have radically different worldviews and different frames for understanding the problem.

  Perspective of people of color

Examine how the American class system looks from the perspective of people of color. Give scrupulous attention to the idea that race is a biological fiction, something that only matters to us socially.

  Difference between health inequalities and health inequities

The major difference between health inequalities and health inequities are:

  Determine the difference between higher and lower pleasures

How do we determine the difference between higher and lower pleasures

  How widespread is wrongdoing in US society today

How widespread is wrongdoing in U.S. society today? Is the problem getting worse? Have you downloaded music illegally or cheated on college assignments

  Shot daunte wright should be found guilty

Whether Kim Potter, the former officer who shot Daunte Wright should be found guilty (write as if you are the prosecutor)

  What are some of the significant points used in the article

In the review, be sure to include an analysis of the article. Provide details and evidence to back up your analysis from the article.

  Describe at least one villain in the story

Rupert Gould became obsessed with restoring Harrison's clocks. Describe his actions and the long term impact. Describe at least one "villain" in the story.

  Argue that the different forms of globalism

Robert Keohane and Joseph Nye argue that the different forms of globalism tend to increase and decrease together.

  Common methods countries use for allocating resources

Describe four common methods countries use for allocating resources and explain how each addresses the issue of competition between members of society.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd