What are the two core components of hadoop

Assignment Help Management Information Sys
Reference no: EM131880035

Part 1: 180 words, critical response to the follow discussion forum topic. APA formatting with reference

Initial posting: What are the two core components of Hadoop?

There are basically 3 important core components of hadoop;

1. MapReduce - A software programming model for processing large sets of data in parallel

2. HDFS - The Java-based distributed file system that can store all kinds of data without prior organization.

3. YARN - A resource management framework for scheduling and handling resource requests from distributed applications.

For computational processing i.e. MapReduce: MapReduce is the data processing layer of Hadoop. It is a software framework for easily writing applications that process the vast amount of structured and unstructured data stored in the Hadoop Distributed Filesystem (HSDF).

It processes huge amount of data in parallel by dividing the job (submitted job) into a set of independent tasks (sub-job).

In Hadoop, MapReduce works by breaking the processing into phases: Map and Reduce. The Map is the first phase of processing, where we specify all the complex logic/business rules/costly code. Reduce is the second phase of processing, where we specify light-weight processing like aggregation/summation.

For storage purpose i.e.HDFS :Acronym of Hadoop Distributed File System - which is basic motive of storage. It also works as the Master-Slave pattern. In HDFS NameNode acts as a master which stores the metadata of data node and Data node acts as a slave which stores the actual data in local disc parallel.

Yarn : which is used for resource allocation. YARN is the processing framework in Hadoop, which provides Resource management, and it allows multiple data processing engines such as real-time streaming, data science and batch processing to handle data stored on a single platform.

Part 2: 180 words, critical response to the follow discussion forum topic. APA formatting with reference

What are the Hadoop ecosystems and what kinds of ecosystems exist?

The Hadoop ecosystem is a very vast set of software bundles that are categorized as belonging to a distributed filesystem ecosystem or a distribute programming ecosystem that can interact with each other and other non-Hadoop software bundle ecosystems as well (Roman, n.d.).

I will not list all of the software bundles in this website but just enough to give you an idea of what types of software bundles makes up the Hadoop ecosystem

Distributed Filesystems:

• Apache HDFS (Hadoop Distributed File System) stores large complex files across clusters, often ran with other programs such as Zookeeper, YARN, Weave, etc.

• Red Hat GlusterFS is described as a Red Hat Hadoop alternative for network servers.

• Quantcast File System (QFS) works with large-scale batch processing and MapReduce loads. Considered an alternative to Apache Hadoop HDFS. This DFS uses striping instead of full multiple replication to save storage capacity.

• Ceph File system works well with large amounts of object, block, or file storage much like Hadoop.

• Lustre File System is for distributed files systems that need high performance and availability over large networks through SCSI protocol.

Hadoop 2.5 supports Lustre.

Distributed Programming:

• Apache Ignite is distributed computing of large-scale data for a wide variety of data types to include key-value, some SQL, map-reduce, etc.

• Apache MapReduce processes large data sets in parallel distributed clusters, with YARN as the resource manager.

• Apache Pig executes data in parallel to Hadoop, using Hadoop HDFS and MapReduce. The main concern of Apache Pig is data flow and uses its own language called Pig Latin.

• JAQL supports, JSON documents, XML, CSV data, SQL data.

NoSQL Databases:

• Apache HBase is derived from Google Big Table, used as the database for Hadoop. Column-orientated works well with MapReduce.

• Apache Cassandra is also derived from Google Big Table and Google File System can run with or without a HDFS. Also has some of he features of Facebook's Dynamo.

SQL-on-Hadoop:

• Apache Hive can provide SQL like language but it is not SQL92 compliant. Uses HiveQL for data summarization, query, and analysis.

Reference no: EM131880035

Questions Cloud

Analyze importance of epidemiology for informing issues : Analyze the importance of epidemiology for informing scientific, ethical, economic, and political discussion of health issues.
How memcachedb control reliability or data replication : What is the difference between Hadoop HDFS and NoSQL DB?How memcachedb control reliability or data replication?
How would you establish ergonomics program to address issues : How would you establish an ergonomics program to address the issues? What would be the greatest obstacles in establishing the ergonomics program?
Define potential management issues facing health care : Identify potential management issues facing health care and information management professionals resulting from the migration and implementation of health.
What are the two core components of hadoop : What are the two core components of Hadoop ?What are the Hadoop ecosystems and what kinds of ecosystems exist?
Importance of techniques used in bringing new employees : Identify techniques and the importance of those techniques used in bringing new employees into an organization (onboarding).
What are the benefits to buying fresh and local foods : There is an "eat local" initiative developing throughout the United States and Europe. What are the benefits to buying fresh, local foods?
Prepare a high level implementation plan : Identifying the key stakeholders of the company who will have an interest in the implementation project .
Analyze video game-type interfaces : Analyze video game-type interfaces and discuss three (3) reasons why video game-type interfaces would not be effective for real-world applications.

Reviews

Write a Review

Management Information Sys Questions & Answers

  List and describe internal information security risks

List and describe internal (online) information security risks and mitigation tactics and how they will affect decision-making strategies.

  Research on the use of vlans in hospitals

Discuss the patterns that can be observed in the benefits that Mobility XE users have realized via its deployment and use.

  Describe the activities that need to occur to set up

Identify, prioritize and describe the activities that need to occur to set up the emergency operations center for your company.

  Compare and contrast the uml class diagram relationships

Compare and contrast the UML class diagram relationships that can exist between classes and explain when you would use each type of relationship to model a software project

  How might you build this into your bi system

Discuss how you might use data mining to improve the Business Administration program at JWU. What kinds of data mining would you use? What kinds of data would be necessary? Be specific.

  1 explain why someone with a serious medical condition be

1. explain why someone with a serious medical condition be concerned about researching his or her condition online

  How dba function crucial to concept of data management

Why is the DBA function crucial to the concept of data management and What administrative responsibilities should be vested in the DBA?

  How might information technology systems be used in

how might information technology systems be used in operations management to improve the business process?describe a

  Research software applications and information systems

Research software applications and information systems available for the various organizational departments within a company, such as accounting, finance, HR, marketing, and management.

  Describe the environment and the people being observed

Describe the environment and the people being observed, i.e., age, gender, dress, etc. Discuss the nonverbal communication, i.e. eye contact.

  Develop and give presentations to clients

Analyze rows and columns of sales figures, develop a variety of graphical presentations and develop and give presentations to clients and other interorganizational departments.

  Create the program which converts fahrenheit to celsius

Create the program which converts Fahrenheit to Celsius - Write a user manual of more than two page - explain how to use your program

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd