Design and implementation of a big data solution

Assignment Help Other Subject
Reference no: EM132302772

Design and implementation of a big data solution

Learning Outcomes

1. Critically evaluate modern big data processing paradigms.

2. Develop and implement a big data solution for a provided dataset.

3. Analyse use cases, visualise and report the results of a big data solution.

4. Assess how ethics govern the design choices in devising a big data enabled solution.

Rationale:

Hadoop can be thought of as a set of open source programs and procedures which anyone can use as the backbone of their big data operations.Today, it is the most widely used system for providing data storage and processing across commodity hardware, off-the-shelf systems linked together, as opposed to expensive, bespoke systems made for the job in hand. In fact it is claimed that more than half of the companies in the Fortune 500 make use of it.

Description:

To get an insight of the state-of-the-art in big data management each student (individually) is tasked to:

• Design, install and configure a 3 nodesApache Hadoop cluster on top of Lubuntu OS. Each student will be provided with access to a 3 virtual machines to deploy the cluster.

• Install and configure MongoDB to work as an interface to the aforementioned cluster.

• Install and configure Apache Spark to work as an analytics engine on top of the aforementioned cluster.

• Download the latest Wikimedia dump dataset and PUT it on the cluster.

• Identifya unique and acceptably challenging data analysis problem that can result of a factual insight for the Wikimedia dataset. The student will choose an insight to look for in the dataset and identify an appropriate method for the analysis process.

• Utilise MongoDB for a better performance data operations.

• Visualise and explain the resulted insights.

• Write a detailed 3000words report about all previous steps and show evidence (such as screenshots or lab work) for each one. The report must be written in an excellent style of academic writing.

Attachment:- Assignment Brief.rar

Reference no: EM132302772

Questions Cloud

Expert systems and ai-based simulators : What are similarities and differences between AI-based expert systems and AI-based simulators?
How transactional databases work : Your assignment is to explore how transactional databases work, and why they will eventually replace relational databases.
Creating a secure environment : Discuss how a Playbook/Runbook relates to creating a secure environment and why policies are important for enforcing security.
Incentive plan to improve the effectiveness of that plan : How can the way an organization creates and carries out its incentive plan to improve the effectiveness of that plan?
Design and implementation of a big data solution : CMP7203 - Big Data Management - BIRMINGHAM CITY UNIVERSITY - Design and implementation of a big data solution - Develop and implement a big data solution
What is the doctrine of strict product liability : What is the doctrine of strict product liability? Have the laws in strict products liability created too many litigation opportunities?
What is an example of a er data model diagrams : What is an example of a ER data model diagrams that meet 3NF, must have at least 5 tables each
Describe project structure that you use to manage project : You have been selected to be the Project Manager (for a project of your choice). The project that you decide to use should meet the key criteria of a project.
Competitive advantage in a business : Provide an example of how Information Technology has created a competitive advantage in a business.

Reviews

len2302772

5/10/2019 1:39:56 AM

Read the entire assignment brief, the required task and marking criteria? Work through the checklist to know what you need to do and submit? Clarify any points you are unsure of with the module coordinator? A 3 node Hadoop cluster is up and running? MongoDB working on top of the Hadoop cluster? Spark is configured and running on top of the Hadoop cluster? Dataset downloaded and been put on the Hadoop cluster? Included all the sections in ‘Assessment Details’? Included all the sections in ‘Assessment Details’?

len2302772

5/10/2019 1:39:43 AM

80 – 100% Excellent critical evaluation of modern big data processing paradigmswhere more than 3 paradigms critically evaluatedincluding distributed data processing engines and distributed streaming platform Excellent development of a big data solution. Hadoop cluster is up and running with Spark, MongoDB and a web interface to show the results. Dataset uploaded to Hadoop and MongoDB.

len2302772

5/10/2019 1:39:30 AM

50 – 59% Good evaluation of modern big data processing paradigms where more than 3 paradigms critically evaluated. 60 – 69% Good evaluation of modern big data processing paradigmswhere more than 3 paradigms critically evaluatedincluding distributed data processing engines. 70 – 79% Very good evaluation of modern big data processing paradigms where more than 3 paradigms critically evaluated including distributed data processing engines and distributed streaming platforms.

len2302772

5/10/2019 1:39:22 AM

Assessment Criteria ? 1. Critically evaluate modern big data processing paradigms. Weighting: 30% Grading Criteria 0 – 29% Insufficient evaluation of modern big data processing paradigms. Less than 3 different paradigms mentioned but not evaluated. 30 – 39% Poor but sufficient evaluation of modern big data processing paradigms where more than 3 paradigms mentioned but not evaluated. 40 – 49% Basic evaluation of modern big data processing paradigms where more than 3 paradigms evaluated.

len2302772

5/10/2019 1:39:05 AM

Assessment Summary The student will write a report that • Evaluates the 5 Vs for a provided big dataset. • Delineates it for structured and unstructured information elements. • Reviews the big data management and distributed data storage and analysis literature to prepare a big data solution for the data source provided. • Shows evidence of data transfer on an open source/proprietary Hadoop system; uses NoSQL, Mapreduce, and Spark on the stored data. • Discuss and visualise theresulting data insights. • Finally, the student will evaluate the role of ethics on data storage and processing. The report will be submittedasone deliverable in the form of a written report. The standard of academic writing should be excellent. (Maximum words: 3000 words, excluding tables, figures and references).

len2302772

5/10/2019 1:38:59 AM

Support available for students required to submit a re-assessment: Timetabled revisions sessions will be arranged for the period immediately preceding the hand in date NOTE: At the first assessment attempt, the full range of marks is available. At the re-assessment attempt the mark is capped and the maximum mark that can be achieved is 50%.

len2302772

5/10/2019 1:38:49 AM

The assignment is about Big Data Management that requires the following: 1. Make virtual machines 2. Configure them into a Hadoop cluster 3. Install mongdoDB and some other stuff 4. Run analysis using Pig/Hadoop 5. Write a report and make visualisations

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd