CN7031 Big Data Analytics Assignment

Assignment Help Other Subject
Reference no: EM132718615

CN7031 Big Data Analytics - University of East London

Big Data Analytics: Coursework

(1) Understanding Dataset: CSE-CIC-IDS20181
This dataset was originally created by the University of New Brunswick for analyzing DDoS data. You can find the full dataset and its description here. The dataset itself was based on logs of the university's servers, which found various DoS attacks throughout the publicly available period to generate totally 80 attributes with 6.40GB size. We will use about 2.6GB of the data to process it with the restricted PCs to 4GB RAM. Download it from here. When writing machine learning or statistical analysis for this data, note that the Label column is arguably the most important portion of data, as it determines if the packets sent are malicious or not.
a) The features are described in the "IDS2018_Features.xlsx" file in Moodle page.
b) The labels are as follows:

• "Label": normal traffic
• "Benign": susceptible to DoS attack
c) In this coursework, we use more than 8.2-million records with the size of 2.6GB. As a big data specialist, firstly, we should read and understand the features, then apply modeling techniques. If you want to see a few records of this dataset, you can either use [1] Hadoop HDFS and Hive, [2] Spark SQL or [3] RDD for printing a few records for your understanding.

1 Source:

(2) Big Data Query & Analysis using Spark SQL
This task is using Spark SQL for converting big sized raw data into useful information. Each member of a group should implement 2 complex SQL queries (refer to the marking scheme). Apply appropriate visualization tools to present your findings numerically and graphically. Interpret shortly your findings.

(3) Advanced Analytics using PySpark
In this section, you will conduct advanced analytics using PySpark.

3.1. Analyze and Interpret Big Data using PySpark
Every member of a group should analyze data through 3 analytical methods (e.g., advanced descriptive statistics, correlation, hypothesis testing, density estimation, etc.). You need to present your work numerically and graphically. Apply tooltip text, legend, title, X-Y labels etc. accordingly.
Note: we need a working solution without system or logical error for the good/full mark.

3.2. Design and Build a Machine Learning (ML) technique
Every member of a group should go over and apply one ML technique. You can apply one the following approaches: Classification, Regression, Clustering, Dimensionality Reduction, Feature Extraction, Frequent Pattern mining or Optimization. Explain and evaluate your model and its results into the numerical and/or graphical representations.

Attachment:- Big Data Analytics.rar

Reference no: EM132718615

Questions Cloud

How are logical fallacies used in marketing : How are logical fallacies used in marketing
Determine the down payment of the loan : $16,000 to purchase a used pickup truck. His down payment is 25% and the APR is 10% for 24 months. Determine the down payment of the loan.
Light of philosophical concepts and theories : The composing is supposed to examine the movie or TV show in the light of philosophical concepts and theories
Describe the characteristics of the aging process : Describe the characteristics of the aging process. Explain how some of the characteristics may lead to elder abuse (memory issues, vulnerability, etc.).
CN7031 Big Data Analytics Assignment : CN7031 Big Data Analytics Assignment Help and Solution, University of East London - Assessment Writing Service - Big Data Analytics
Application of the hedonic calculus : Please explain and defend the stand and include analysis and application of the Hedonic Calculus. Thank you!
Should the firm lease or buy the machine : Should the firm lease or buy the machine? Show your calculations. Round all dollar amounts to the nearest dollar. The firm for which you work requires the use
What you can do as a nurse to support your clients : End-of-life care becomes an issue at some point for elderly clients. Even with the emergence of palliative care programs and hospice programs, most elderly.
Identity theory explains the mind-body problem : Explain how the Brain Identity Theory explains the Mind/Body Problem.

Reviews

Write a Review

Other Subject Questions & Answers

  This week you explored how different epistemologies relate

research methodology can help to define the activities of research how to proceed with the research how to measure

  What are the key features of pay equity legislation

What are the key features of Pay Equity Legislation? What is the difference between the notion of Equal Pay for Equal Work and the notion of Equal Value?

  How would you handle this situation

The nurse student is trying to make eye contact and puts her hand on the shoulder of the patient to get his attention.

  Exile after exile

“Exile after Exile.” Please respond to the following: Explain at least two (2) ways in which you think that the concept of “exile” in Judaism (beginning with the exile from the Garden of Eden, then Babylon, and so forth up to the reestablishment of t..

  How did you establish your production schedule for each line

As the Vice President of your product please address these questions for the decisions you made in Production:  Did you purchase machinery to automate your facilities? Why or why not? Did you buy or sell capacity of product lines? How did you establi..

  American dream of social mobility a myth

In your opinion, is the American dream of social mobility a myth? Why or why not? Defend your answer.

  BAF 403 Financial Institutions and Markets Assignment

BAF 403 Financial Institutions and Markets Assignment Help and Solution - Emirates College of Technology, UAE - Assessment Writing Service

  Administering a psychometric test-assessment

What are the pros and cons of using either interviews or self-report when administering a psychometric test/assessment? What can be done to lessen the negatives

  Describe in which of given scenarios you would delegate

To Delegate or Not to Delegate? Write a memo to identify and describe in which of these scenarios you would delegate and in which you would not.

  Which techniques of neutralization do these justifications

List possible justifications such students might give for their cheating behavior. Which techniques of neutralization do these justifications illustrate?

  Write a response for each given discussion question

Write a response for each discussion Question. The Affordable Care Act is a health care reform that was implemented to help provide insurance to individuals who had no insurance through the public programs or through an employer.

  How violent extremism lone wolf shooters have become threats

Discuss how violent extremism, lone wolf shooters have become threats to the U.S. homeland.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd