Big data analytics using hadoop and spark

Assignment Help Other Subject
Reference no: EM133593973

Big Data Analytics: Coursework

Marking Scheme for Report

1. Log Data Conversion using PySpark to DataFrame:
2. Advanced Spark SQL queries for data analysis
3. Spark RDD queries for data analysis (3 queries per student, at least two operations/queries should be advanced): 50 marks (10 (not advanced) + 20 (advanced) + 20 (advanced))
4. Discussion on LSEP considerations:
5. Overall clarity, organization, and quality of the report

Big Data Analytics using Hadoop and Spark

(1) Big Data Analysis using Spark DF
1. We possess a web log file of considerable size, with the following data description. Each line is structured as shown below, and the table provides a description for each row.

2. Download data from here. You need your UEL ID for permission.
3. Using Spark DF, convert the web.log unstructured file into DataFrame. Your data conversion should be in the format of above table. Doing the conversion is a part of your self-study that you need to complete.
4. Individually, each student should write two advanced SQL queries on the DataFrame to extract specific insights. The queries should showcase your understanding of Spark SQL functionalities and demonstrate your ability to handle real-world data analysis tasks. Ensure that the queries provide meaningful insights and go beyond basic operations.
5. Each student should provide the working solution for each query in the HTML report.
6. You can utilize Python, specifically libraries such as matplotlib or seaborn, for data visualization. By doing so, you will have the opportunity to achieve the maximum mark.

Basic Queries: Basic queries demonstrate a basic understanding of SQL syntax and perform simple operations on the data. These queries typically involve basic SELECT, WHERE, and GROUP BY clauses without complex joins or subqueries. Basic queries will be awarded the minimum mark for this section.

Advanced Queries: Advanced queries demonstrate a deeper understanding of SQL and involve more complex operations and techniques. These queries may include the use of advanced SQL features such as window functions, subqueries, joins, and aggregations. Advanced queries demonstrate creativity and the ability to extract meaningful insights from the data. These queries will be awarded higher marks based on the complexity, efficiency, and effectiveness of the analysis.

(2) Big Data Analysis using Spark RDD

1. Use Spark RDD for reading the same unstructured data, web.log data.

2. Each student should write 3 RDD queries using Spark RDD transformation and action operators. At least two queries should be advanced for each student.

3. The queries should be different from the ones written in Task 1, showcasing the use of Spark RDD capabilities. Emphasize the importance of using RDD-specific operations rather than SQL queries.

4. Each student should provide the working solution for each RDD query in the HTML report.

5. You can utilize Python, specifically libraries such as matplotlib or seaborn, for data visualization.

Marking Scheme for Spark RDD Queries:
Basic Spark RDD Queries: Basic queries demonstrate a basic understanding of Spark RDD transformations and actions. These queries typically involve simple operations such as filtering, mapping, and basic aggregations using RDD functions. Basic queries may not fully leverage the power and capabilities of Spark RDDs and may not showcase advanced techniques. Basic queries will be awarded the minimum mark for this section.

Advanced Spark RDD Queries: Advanced queries showcase a deeper understanding of Spark RDD transformations and actions. These queries involve more complex operations and techniques, leveraging the full capabilities of Spark RDDs. Advanced queries may include operations like joins, aggregations, sorting, and complex data manipulations using RDD functions. Advanced queries demonstrate creativity and the ability to extract meaningful insights from the data using Spark RDDs. These queries will be awarded higher marks based on the complexity, efficiency, and effectiveness of the analysis.

(3) LSEP considerations
For all analyses performed, critically analyze the legal, social, ethical, and professional implications associated with the data and the analysis. Consider factors such as data privacy, data protection, bias, fairness, transparency, and the potential impact of the analysis on individuals or society as a whole.

Every student should choose one of these factors to contribute to.

Attachment:- Big Data Analytics.rar

Reference no: EM133593973

Questions Cloud

What are your companys vulnerabilities : What are your company's vulnerabilities? What are the threats to your company's people, resources, and business model?
Write a short paper showing your knowledge in econometrics : Write a short paper 4 pages showing your knowledge in econometrics. Take this opportunity not only to show your strength in econometric tools
Compare collaborative-based and content based filtering : Compare and contrast collaborative-based filtering and content based filtering. Then list and describe the 4 large challenges in web personalization.
Define what is meant by culturally grounded social work : Define what is meant by culturally grounded social work. What do practitioners struggle with regarding implementation of culturally grounded approaches?
Big data analytics using hadoop and spark : Big Data Analytics using Hadoop and Spark - demonstrate your ability to handle real-world data analysis tasks. Ensure that the queries provide meaningful
Helping children to achieve purposes : Describe what you believe the classroom teacher's responsibilities are for helping children to achieve the purposes
Demonstrate a deeper understanding of SQL : CN7031 Big Data Analytics, University of East London - Demonstrate creativity and the ability to extract meaningful insights from the data using Spark RDDs
Cognitive-behavioral approaches emphasize skills to control : Cognitive-behavioral approaches emphasize skills to control addiction rather than always requiring complete abstinence.
Measurement procedures for problem behavior : Measurement procedures for problem behavior(s) and replacement behavior(s). Describe how permanent products, event recording,

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd