Demonstrate a deeper understanding of SQL

Assignment Help Other Subject
Reference no: EM133593971

Big Data Analytics: Coursework

Marking Scheme for Report

1. Log Data Conversion using PySpark to DataFrame:
2. Advanced Spark SQL queries for data analysis
3. Spark RDD queries for data analysis (3 queries per student, at least two operations/queries should be advanced): 50 marks (10 (not advanced) + 20 (advanced) + 20 (advanced))
4. Discussion on LSEP considerations:
5. Overall clarity, organization, and quality of the report

Big Data Analytics using Hadoop and Spark

(1) Big Data Analysis using Spark DF
1. We possess a web log file of considerable size, with the following data description. Each line is structured as shown below, and the table provides a description for each row.

2. Download data from here. You need your UEL ID for permission.
3. Using Spark DF, convert the web.log unstructured file into DataFrame. Your data conversion should be in the format of above table. Doing the conversion is a part of your self-study that you need to complete.
4. Individually, each student should write two advanced SQL queries on the DataFrame to extract specific insights. The queries should showcase your understanding of Spark SQL functionalities and demonstrate your ability to handle real-world data analysis tasks. Ensure that the queries provide meaningful insights and go beyond basic operations.
5. Each student should provide the working solution for each query in the HTML report.
6. You can utilize Python, specifically libraries such as matplotlib or seaborn, for data visualization. By doing so, you will have the opportunity to achieve the maximum mark.

Basic Queries: Basic queries demonstrate a basic understanding of SQL syntax and perform simple operations on the data. These queries typically involve basic SELECT, WHERE, and GROUP BY clauses without complex joins or subqueries. Basic queries will be awarded the minimum mark for this section.

Advanced Queries: Advanced queries demonstrate a deeper understanding of SQL and involve more complex operations and techniques. These queries may include the use of advanced SQL features such as window functions, subqueries, joins, and aggregations. Advanced queries demonstrate creativity and the ability to extract meaningful insights from the data. These queries will be awarded higher marks based on the complexity, efficiency, and effectiveness of the analysis.

(2) Big Data Analysis using Spark RDD

1. Use Spark RDD for reading the same unstructured data, web.log data.

2. Each student should write 3 RDD queries using Spark RDD transformation and action operators. At least two queries should be advanced for each student.

3. The queries should be different from the ones written in Task 1, showcasing the use of Spark RDD capabilities. Emphasize the importance of using RDD-specific operations rather than SQL queries.

4. Each student should provide the working solution for each RDD query in the HTML report.

5. You can utilize Python, specifically libraries such as matplotlib or seaborn, for data visualization.

Marking Scheme for Spark RDD Queries:
Basic Spark RDD Queries: Basic queries demonstrate a basic understanding of Spark RDD transformations and actions. These queries typically involve simple operations such as filtering, mapping, and basic aggregations using RDD functions. Basic queries may not fully leverage the power and capabilities of Spark RDDs and may not showcase advanced techniques. Basic queries will be awarded the minimum mark for this section.

Advanced Spark RDD Queries: Advanced queries showcase a deeper understanding of Spark RDD transformations and actions. These queries involve more complex operations and techniques, leveraging the full capabilities of Spark RDDs. Advanced queries may include operations like joins, aggregations, sorting, and complex data manipulations using RDD functions. Advanced queries demonstrate creativity and the ability to extract meaningful insights from the data using Spark RDDs. These queries will be awarded higher marks based on the complexity, efficiency, and effectiveness of the analysis.

(3) LSEP considerations
For all analyses performed, critically analyze the legal, social, ethical, and professional implications associated with the data and the analysis. Consider factors such as data privacy, data protection, bias, fairness, transparency, and the potential impact of the analysis on individuals or society as a whole.

Every student should choose one of these factors to contribute to.

Attachment:- Big Data Analytics.rar

Reference no: EM133593971

Questions Cloud

Compare collaborative-based and content based filtering : Compare and contrast collaborative-based filtering and content based filtering. Then list and describe the 4 large challenges in web personalization.
Define what is meant by culturally grounded social work : Define what is meant by culturally grounded social work. What do practitioners struggle with regarding implementation of culturally grounded approaches?
Big data analytics using hadoop and spark : Big Data Analytics using Hadoop and Spark - demonstrate your ability to handle real-world data analysis tasks. Ensure that the queries provide meaningful
Helping children to achieve purposes : Describe what you believe the classroom teacher's responsibilities are for helping children to achieve the purposes
Demonstrate a deeper understanding of SQL : CN7031 Big Data Analytics, University of East London - Demonstrate creativity and the ability to extract meaningful insights from the data using Spark RDDs
Cognitive-behavioral approaches emphasize skills to control : Cognitive-behavioral approaches emphasize skills to control addiction rather than always requiring complete abstinence.
Measurement procedures for problem behavior : Measurement procedures for problem behavior(s) and replacement behavior(s). Describe how permanent products, event recording,
Provide an example of each diagram that you found : Provide an example of each diagram that you found. Do the contents of the diagram match what you would expect to see based on your research?
Create a table showing all the layers of osi model : Create a table in a Word document showing all the layers of the OSI model in one column and the layers of the TCP/IP model in another.

Reviews

Write a Review

Other Subject Questions & Answers

  Introduce the sustainable development goal

Introduce the Sustainable Development Goal, its relationship to health and its global context / countries and communities most effected

  Explain the three main benefits of sleep

Explain the three main benefits of sleep as discussed in your readings. How do you feel when you don't get enough sleep physically and mentally?

  Will the study involve physical stress to the human subjects

Study involve physical stress to the human subjects such as might result from heat, noise, electric shock, pain, sleep loss, deprivation of food and drink, drugs, alcohol?

  What insights did your peers share on theory e and theory o

What insights did your peers share on Theory E and Theory O that will help you better understand and implement these theories?

  Corporate social responsibility achievement

Write the lede for a press release/media statement about a recent Corporate Social Responsibility achievement by a company using the format and techniques.

  Ethnography compared with survey research

What do you see as the strength and weaknesses of ethnography compared with survey research? Which provides more accurate data? Might one be better for finding questions, while the other is better for finding answers? Or does it depend on the context..

  Which ethical principles were violated in this study

This study is a landmark in medical ethics for numerous reasons. Which ethical principles were violated in this study?

  Explain purpose of the proposal is to persuade

The purpose of the proposal is to persuade my reader to believe that I am interested in the topic and ready to learn how to develop the topic into a Project.

  Create unique example scenario where you as researcher

develop a unique example scenario where you as a researcher would analyze your data using an independent sample desing.

  Explain what is reliability generalization

What is reliability generalization, what type of data has been analyzed and what can be concluded about the reliability

  Describe your acculturation experience as a new graduate

Describe your acculturation experience as a new graduate to the culture of the nursing profession. How is it similar or different from the acculturation.

  You are asked to choose the best energy source to power

imagine you work for a city of your choosing may be imaginary.nbsp you are asked to choose the best energy source to

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd