Implement an efficient data layout and retrieval strategy

Assignment Help JAVA Programming
Reference no: EM133099787

SYSTEMS - ASSIGNMENT

Title: Implement an efficient data layout and retrieval strategy for a Hadoop Cluster

Overview & background:

A database may have some common data repeated across records. For example, in the attached CSV file (that is exported from a database) some column values are same among multiple rows. These common values are repetitively stored in the database records, which increases the storage cost but reduces retrieval time for analytical queries.

However, we need to create a layout of this kind of dataset on a Hadoop cluster at a reduced storage cost. So, we need to understand the commonality of values across records and create a data layout that avoids duplicate values. But at the same time, we need to allow retrieval of a complete data record from the storage, given a record identifier.

Input: CSV data with flat schema with multiple records and features. Description:

1. STORAGE:

Each Storage Node will store the data based on below condition.
a. Mutually Exclusive feature data (column value) which is not common across records (rows): private node
b. Feature data common in two records: 2-way shared node ec.atFure data common in four records: 4-way shared node.
d. Feature data common in eight records: 8-way shared node.
Note: Private node, 2,4,8- way shared nodes are storage nodes which stores feature values which are common in 2, 4, 8 records respectively.

2. METADATA
Maintain record ID wise metadata about above storage deployments, which will explain how the feature values are stored across the storage nodes. The meta-data can be stored on a specific node.

3. RETRIEVAL:

For provided record ID, retrieval of record will refer step 2 to fetch all the required features (column values) from respective storage nodes to form the original record.

NOTE: You can apply different techniques to understand the similarity of feature values like normalization, standardization, vectorization etc.

1. A Python / Java / Spark code which enables
a. the given CSV data to be written, using the distributed storage layout strategy described, to reduce duplicate data, and
b. retrieval of any record given the record ID from the distributed storage.

2. Report compression ratio achieved using above approach, i.e. how much storage reduction happens using the de-duplicated data layout on the cluster.

3. You can use a Hadoop cluster, a plain cluster of a set of nodes, or any BigData storage framework to demonstrate your data storage and retrieval code. Describe your setup in detail.

4. You should provide clear instructions to reproduce the submission on the Evaluator's setup.

5. Your code and results should be reproducible

6. The implementation should be general purpose for any other CSV input file.

Attachment:- Assignment-Problem-Statement.rar

Attachment:- Data.rar

Reference no: EM133099787

Questions Cloud

Describe the components of it infrastructure : List and describe the components of IT infrastructure firms need to manage.
How many units of Sancho products must be sold : Sancho Company sells a product for $50 per unit, with $37 per unit in variable costs. How many units of Sancho's products must be sold
Levels of government in canada : 1. Identify the THREE (3) levels of government in Canada that have passed laws pertaining to the operation of automobiles. Provide ONE (1) example of a law pass
Describe the components of it infrastructure : List and describe the components of IT infrastructure firms need to manage. Write in APA format
Implement an efficient data layout and retrieval strategy : Implement an efficient data layout and retrieval strategy for a Hadoop Cluster and plain cluster of a set of nodes, or any BigData storage framework
High failure rate for implementations : Explain why there is such a high failure rate for implementations involving enterprise applications, business process reengineering, and mergers and acquisition
Supply and demand relating to competition : In a Capitalistic society (such as the U. S.) is competition good or bad in your opinion and why? Please support your opinion with an example
What topic-issue does the writer address : At the rate our island paradise is able to convince our people to get properly vaccinated we should get about 70 per cent of our tough-headed people fully vacci
What is a budget and why it is important for a business : What is a budget and why it is important for a business to have a budget? What do you think might be some pitfalls of budgeting

Reviews

Write a Review

JAVA Programming Questions & Answers

  Recursive factorial program

Write a class Array that encapsulates an array and provides bounds-checked access. Create a recursive factorial program that prompts the user for an integer N and writes out a series of equations representing the calculation of N!.

  Hunt the wumpus game

Reprot on Hunt the Wumpus Game has Source Code listing, screen captures and UML design here and also, may include Javadoc source here.

  Create a gui interface

Create GUI Interface in java programing with these function: Sort by last name and print all employees info, Sort by job title and print all employees info, Sort by weekly salary and print all employees info, search by job title and print that emp..

  Plot pois on a graph

Write a JAVA program that would get the locations of all the POIs from the file and plot them on a map.

  Write a university grading system in java

University grading system maintains number of tables to store, retrieve and manipulate student marks. Write a JAVA program that would simulate a number of cars.

  Wolves and sheep: design a game

This project is designed a game in java. you choose whether you'd like to write a wolf or a sheep agent. Then, you are assigned to either a "sheep" or a "wolf" team.

  Build a graphical user interface for displaying the image

Build a graphical user interface for displaying the image groups (= cluster) in JMJRST. Design and implement using a Swing interface.

  Determine the day of the week for new year''s day

This assignment contains a java project. Project evaluates the day of the week for New Year's Day.

  Write a java windowed application

Write a Java windowed application to do online quiz on general knowledge and the application also displays the quiz result.

  Input pairs of natural numbers

Java program to input pairs of natural numbers.

  Create classes implement java interface

Interface that contains a generic type. Create two classes that implement this interface.

  Java class, array, link list , generic class

These 14 questions covers java class, Array, link list , generic class.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd