Write a hive query to find the total number of records

Assignment Help Computer Engineering
Reference no: EM133627537

Dataset:

We are going to use the up-to-date Olympic Event historical dataset (based on the 1908 to 2022 Olympics Games) from the olympedia.org

"Assignment 3 DataSet Part 2.zip" dataset is available in the course shell (under Assignment 3), the zip contains 4 files having information on athletes: gold, silver, bronze, and athlete_event_results please download it and by using Hadoop and Hive answer the below questions.

athlete_event_results (athlete_id, athlete, edition, sport, event, medal)

gold (athlete_id, athlete, edition, sport, event)

silver (athlete_id, athlete, edition, sport, event)

bronze (athlete_id, athlete, edition, sport, event)

Preparation

Store complete information of all athlete event results into a hive table.

Store data (gold, silver, bronze files) into a hive table that is partitioned on medal.

Show database and table structures

Questions

Question 5: Write the following queries, report results and execution time on both partitioned and complete data:

-- Substitute *table* with actual table names (you will answer the questions for all 4 tables)

a) Write a hive query to find the total number of records from the *table*

b) Write a hive query to find the total number of records by medals from *table*

c) Write a hive query to find the number of athletes who won the medals by sport from *table*

d) Write a hive query to find all athletes who won the medals between the years 1996 and 2016 from *table*

e) Provide the execution time of the below query for two tables (one for the one partitioned, one for athlete_event_results);
Select t.year, count(t.year) as count from (Select regexp_extract(edition, '(\\d{4})',1) as year from *table* where medal='Bronze') t group by year order by count desc limit 10;

f) Provide the execution time of the query (one for the one partitioned, one for athlete_event_results);
Select t.year, count(t.year) as count from (Select regexp_extract(edition, '(\\d{4})',1) as year from *table* where medal='Gold') t group by year order by count desc limit 20;

Reference no: EM133627537

Questions Cloud

Why is knowledge of html important for security professional : Why is knowledge of HTML important for a security professional? Why is knowledge of HTML important for a security professional? HTML is a markup language
Find the dual linear programming problem : Find the dual linear programming problem of this primal problem, and use the variables y1, y2 for the dual and Solve both the dual and primal problems
Defining an anonymization strategy for a dataset : Defining an anonymization strategy for a dataset that contains the following fields with personally identifiable information, along with multiple measured
Define the covenant of abraham : Define the Covenant of Abraham and provide the biblical source for it. Define the Sinai Covenant and provide the biblical source for it.
Write a hive query to find the total number of records : Write a hive query to find the total number of records from the table and Write a hive query to find all athletes who won the medals between the years 1996
Important to embrace theology : In regards to eschatology and other areas of theology, is it more important to embrace a theology
Describe the impact that spirituality and religion : Describe the impact that spirituality and/or religion can have on how patients view their health and its consequences.
Church in building healthy and sustainable families : Discuss comprehensively the role of the Church in building healthy and sustainable families.
Discuss how the divide-and-conquer approach : Compare time complexities. Discuss how the divide-and-conquer approach is used to reduce the overall time complexity

Reviews

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd