Explain the big data fundamentals

Assignment Help JAVA Programming
Reference no: EM133004338

COSC 2637 Big Data Processing

Overview
Write Spark programs which gives your chance to apply the essential components you learned in lectures and to understand the complexity of Spark programming.

Learning Outcome 1 - Model and implement efficient big data solutions for various application areas using appropriately selected algorithms and data structures.
Learning Outcome 2 - Analyse methods and algorithms, to compare and evaluate them with respect to time and space requirements and make appropriate design choices when solving real-world problems.
Learning Outcome 3 - Explain the Big Data Fundamentals, including the evolution of Big Data, the characteristics of Big Data and the challenges introduced.
Learning Outcome 4 - Apply non-relational databases, the techniques for storing and processing large volumes of structured and unstructured data, as well as streaming data.
Learning Outcome 5 - Apply the novel architectures and platforms introduced for Big data, i.e. Hadoop, MapReduce and Spark.

Task - Spark Streaming

Develop a spark streaming program with Scala Maven to monitor a folder in HDFS in real time such that any new file in the folder will be processed. The following three tasks are implemented in the same Scala object:

A. For each RDD of Dstream, count the word frequency and save the output in HDFS. Use regular expression to make sure that each word consists of characters only (tip: findAllIn()).

B. For each RDD of Dstream, filter out the short words (i.e., < 5characters) and then count the co-occurrence
frequency of words (the words are considered co-occurred if they are in the same line); save the output in HDFS.

C. For the Dstream, filter out the short words (i.e., < 5characters) and then count the co-occurrence frequency of words (the words are considered co-occurred if they are in the same line); save the output in HDFS. Note you are required to use updateStateByKey operation to continuously update the co- occurrence frequency of words with new information.

Format Requirements:

Failure to follow the requirements incur penalty
(a) The source codes are entailed in each Scala Maven project.
(b) You need create a single Scala Maven project for all three tasks.
(c) Submit the developed Scala Maven project in a single .zip file with a standalone jar file.
(d) The zip file should be named as sxxxxx_BDP_A3_2021.zip (replace sxxxxx bi student ID).
(e) You need include a "README" file in the zip file.
(f) In README, you must specify exactly how to run the standalone jar in AWS EMR platform to perform the tasks.
(g) Paths of input and output should not be hard-coded.
(h) Each task has its own output path.

Functional Requirements:
Failure to follow the requirements incur penalty
(i) For each task, the output file should be saved with the current time or a unique sequence number in HDFS.
(j) All three tasks are implemented in the same Scala object.

Attachment:- Big Data Processing.rar

Reference no: EM133004338

Questions Cloud

Transient analysis of Series RL and RC circuits : To obtain the transient response and measure the time constant of a series RL and RC circuit for a pulse waveform - Make the connections as per the circuit
Global economic integration and humanitarian development : Does a consistent Christian worldview require definite support for or opposition to increasing global economic integration and humanitarian development
Calculate the annual amortization charge : Calculate their annual amortization charge. If the RM anticipates a sale price of $20,000 in 2010, what is the anticipated gain or loss on the sale of the asset
Create the journal entry to record receipt of the payment : Create the journal entry to record receipt of the payment. On May 1, 2021, Jackson Corporation sold five computers to Computing Plus for $10,000
Explain the big data fundamentals : Explain the Big Data Fundamentals, including the evolution of Big Data, the characteristics of Big Data and the challenges introduced
Find how many years will take to reach nominal goal : Find how many years will it take you to reach your $10,000 nominal goal and what was the last deposit amount? You need to raise $10,000.
Create monthly schedule of cash receipts for second quarter : Create a monthly schedule of cash receipts for the second quarter of 20X5. Ten percent of Frank's sales are for cash, 70% of accounts receivable are collected.
What will be their tax savings for the first year : They took out the loan on May 1. John and Cheryl are in the 28% tax bracket. What will be their tax savings for the first year ending December 31st?
What is the anticipated gain or loss on the sale of asset : The RM of Bismarck is looking to sell, If the RM anticipates a sale price of $20,000 in 2010, what is the anticipated gain or loss on the sale of the asset?

Reviews

Write a Review

JAVA Programming Questions & Answers

  Java programming-gamesystem

JTabbed Panes for GUI Components Part 2: --- Create a JFrame program with a JTabbed pane on it with two panels that mimics a control box for a game system.

  Create a java class that implements floating point add

For this assignment, you will be creating a Java class that implements floating point Add and Multiply. The layout of your class should look like this.

  Program should assign a seat in the first class

If a person enters 1, your program should assign a seat in the first class (rows 1 - 3). If a person enters 2, your program should assign a seat in business class (rows 4 - 7). If a person enters 3, your program should assign a seat in economy class ..

  Write a console program with java

Programming exercises are a key component of learning any programming language

  Determines and displays the smallest and largest

Create a method that determines and displays the smallest and largest of the ten values. Then, pass the array to the method - Create a GUI application whose button's Click() method accepts ten integer values from a TextBox and stores them in an arra..

  Write a complete java program called recursivetriangle

Created a regular RecursiveTriangle (method calling itself) to create a Triangle using the number of lines and String I enter -

  Create a hotel management web application

CSCE 464/864 - Spring 2016 - Create a hotel management web application that can be used by two types of users: customers and clients. A customer is a user who intends to search for hotels and make reservations.

  How events and graphical user interfaces are related

Describe what an event is and how events and Graphical User Interfaces (GUI) are related

  Write a method called select_set

Write a method called select_set that takes in 1 parameter: followers (a 2D array of Tweeter users) and returns the user IDs of any number of users (as an array of integers) selected by your algorithm who will be selected to broadcast the advertis..

  COIT11222 Programming Fundamentals Assignment

COIT11222 Programming Fundamentals Assignment Help and Solution, Central Queensland University - Assessment Writing Service - develop a Windowed GUI Java

  How to add a static data member

Create one project for each problem; add comments to your code -  write a program which Add a static data member to count the number of objects will be created.

  Create a class in java with appropriate methods

Create a class in java with appropriate methods. What difficult problems did you encounter, and how did you handle them?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd