Big Data Analytics - Project Presentation

Assignment Help Data Structure & Algorithms
Reference no: EM133144904 , Length: 6000 Words

UEL-CN-7031 Big Data Analytics - University of East London

Project Presentation

Big Data Analytics

This coursework (CRWK) must be attempted as an individual work. This coursework is divided into two sections: (1) Big Data analytics on a real case study and (2) presentation.

Overall mark for CRWK comes from two main activities as follows:

1- Big Data Analytics report (around 5,000 words, with a tolerance of ± 10%) (60%) 2- Presentation

Tasks:

(1) Understanding Dataset: UNSW-NB15
1The raw network packets of the UNSW-NB15 dataset was created by the IXIA PerfectStorm tool in the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) for generating a hybrid of real modern normal activities and synthetic contemporary attack behaviours.
Tcpdump tool used to capture 100 GB of the raw traffic (e.g., Pcap files). This data set has nine types of attacks, namely, Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode and Worms. The Argus and Bro-IDS tools are used and twelve algorithms are developed to generate totally 49 features with the class label.
a) The features are described here.
b) The number of attacks and their sub-categories is described here.
c) In this coursework, we use the total number of 10-million records that was stored in the CSV file (download). The total size is about 600MB, which is big enough to employ big data methodologies for analytics. As a big data specialist, firstly, we would like to read and understand its features, then apply modeling techniques. If you want to see a few records of this dataset, you can import it into Hadoop HDFS, then make a Hive query for printing the first 5-10 records for your understanding.

(2) Big Data Query & Analysis by Apache Hive
This task is using Apache Hive for converting big raw data into useful information for the end users. To do so, firstly understand the dataset carefully. Then, make at least 4 Hive queries (refer to the marking scheme). Apply appropriate visualization tools to present your findings numerically and graphically. Interpret shortly your findings.

Finally, take screenshot of your outcomes (e.g., tables and plots) together with the scripts/queries into the report.

(3) Advanced Analytics using PySpark
In this section, you will conduct advanced analytics using PySpark.

Analyze and Interpret Big Data
We need to learn and understand the data through at least 4 analytical methods (descriptive statistics, correlation, hypothesis testing, density estimation, etc.). You need to present your work numerically and graphically. Apply tooltip text, legend, title, X-Y labels etc. accordingly to help end-users for getting insights.

Design and Build a Classifier

a) Design and build a binary classifier over the dataset. Explain your algorithm and its configuration. Explain your findings into both numerical and graphical representations. Evaluate the performance of the model and verify the accuracy and the effectiveness of your model.

b) Apply a multi-class classifier to classify data into ten classes (categories): one normal and nine attacks (e.g., Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode and Worms). Briefly explain your model with supportive statements on its parameters, accuracy and effectiveness.

(4) Individual Assessment
Discuss (1) what other alternative technologies are available for tasks 2 and 3 and how they are differ (use academic references), and (2) what was surprisingly new thinking evoked and/or neglected at your end?
Tip: add individual assessment of each member in a same report.

(5) Documentation
Document all your work. Your final report must follow 5 sections detailed in the "format of final submission" section (refer to the next page). Your work must demonstrate appropriate understanding of academic writing and integrity.

Attachment:- Project Presentation.rar

Reference no: EM133144904

Questions Cloud

Sample code of conduct for chosen business : 1. Have you ever been confronted with an ethical issue, for example, cheating on an exam or lying to your parents? How did you deal with it?
What will her monthly premium be : X Sandra uses the formula, P = 208, to find her approximate six-month premium when her driver risk factor D, What will her monthly premium be
Control work processes and monitor performance : As a manager, you will have to control work processes and monitor performance. There are three types of controls that occur in organizations:
What would be the minimum stand-by fare to increase profits : Commission on stand-by-fares is 20%. What would be the minimum stand-by fare to increase profits
Big Data Analytics - Project Presentation : Apply a multi-class classifier to classify data into ten classes (categories): one normal and nine attacks (e.g., Fuzzers, Analysis, Backdoors, DoS, Exploits
Define the 4 terms from the organizational behavior chapters : Define the 4 terms from the Organizational Behavior chapters. Explain how understanding each term will help you improve your business writing.
Explain the management functions of planning : Explain the management functions of planning, organizing, leading, and controlling in your own words. Also, explain how the functions work together in business.
Thoughts about the marketing aspect of the information : After reviewing the video. What are your thoughts about the marketing aspect of the information?
Responsibility issues confronting an organization : Identify the potential ethical and social responsibility issues confronting an organization. Students may choose any organization.

Reviews

Write a Review

Data Structure & Algorithms Questions & Answers

  Review algorithm identify inaccuracies and insert correction

Review the below algorithm, identify the inaccuracies, and insert the corrections: Modify the following algorithm to include the repetition structure. Review the algorithm, identify the inaccuracies, insert the corrections, save the document, and ..

  Write prolog code to generate successors

Write prolog code to generate successors, etc. In this assignment, we are exploring how to standardise the representation of states and successor knowledge

  Graph the probability distribution for the bond return

Graph the probability distribution for the bond returns based on the 5 scenarios. What might the graph of the probability distribution look like if there were an infinite number of scenarios (i.e., if it were a continuous distribution and not a discr..

  Question about arrays, vector and stl

Suppose if x denotes the mean of a sequence of numbers x1, x2,.....xn, variance is the average of the squares of the deviations of numbers from the mean.

  Running time analyses of all the methods

You need to give the running time analyses of all the methods in terms of the Big O notation. Include your running time analyses in the source file of the CompressedSuffixTrie class and comment out them.

  Explain the apriori algorithm and its approach

There are different algorithms used to identify frequent itemsets in order to perform association rule mining such as Apriori, FP Growth and Mafia Algorithm.

  Count up the number of times that both arrays

Count up the number of times that both arrays have the same integer value at the same index.

  Use either the bubble sort or the selection sort algorithms

use either the Bubble Sort or the Selection Sort algorithms

  Compare the average behavior of insertion sort

Compare the average behavior of insertion sort for n elements with that of the n insertions into an initially-empty straight array implementation of a priority queue

  Explain what is a breadth-first search

Explain what is a breadth-first search. Implement the informal algorithm ( pseudo code) of breadth first search. Provide the description of what the code does.

  Linked lists give a program to implement the insert

give a program to implement the insert operation and delete operations on a queue using linked

  Create a mind map with your defense in depth approach

Read the article "The Vulnerability of Nuclear Facilities to Cyber Attacks". Create a mind map or diagram with your defense in depth approach to securing a nuclear power plant. Use your text and open research on the Internet to assist in building ..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd