UEL-CN-7031 Big Data Analytics Assignment

Assignment Help Basic Computer Science
Reference no: EM133063106

UEL-CN-7031 Big Data Analytics - University of East London

This coursework (CRWK) must be attempted as an individual work. This coursework is divided into two sections: (1) Big Data analytics on a real case study and (2) presentation.

Overall mark for CRWK comes from two main activities as follows:
1- Big Data Analytics report (around 5,000 words, with a tolerance of ± 10%) (60%) 2- Presentation (40%)

Task:

(1) Understanding Dataset: UNSW-NB15
The raw network packets of the UNSW-NB151 dataset was created by the IXIA PerfectStorm tool in the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) for generating a hybrid of real modern normal activities and synthetic contemporary attack behaviours.
Tcpdump tool used to capture 100 GB of the raw traffic (e.g., Pcap files). This data set has nine
types of attacks, namely, Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode and Worms. The Argus and Bro-IDS tools are used and twelve algorithms are developed to generate totally 49 features with the class label.
a) The features are described here.
b) The number of attacks and their sub-categories is described here.
c) In this coursework, we use the total number of 10-million records that was stored in the CSV file (download). The total size is about 600MB, which is big enough to employ big data methodologies for analytics. As a big data specialist, firstly, we would like to read and understand its features, then apply modeling techniques. If you want to see a few records of this dataset, you can import it into Hadoop HDFS, then make a Hive query for printing the first 5-10 records for your understanding.

(2) Big Data Query & Analysis by Apache Hive
This task is using Apache Hive for converting big raw data into useful information for the end users. To do so, firstly understand the dataset carefully. Then, make at least 4 Hive queries (refer to the marking scheme). Apply appropriate visualization tools to present your findings numerically and graphically. Interpret shortly your findings.

Finally, take screenshot of your outcomes (e.g., tables and plots) together with the scripts/queries into the report.

(3) Advanced Analytics using PySpark
In this section, you will conduct advanced analytics using PySpark.

Analyze and Interpret Big Data
We need to learn and understand the data through at least 4 analytical methods (descriptive statistics, correlation, hypothesis testing, density estimation, etc.). You need to present your work numerically and graphically. Apply tooltip text, legend, title, X-Y labels etc. accordingly to help end-users for getting insights.

Design and Build a Classifier

a) Design and build a binary classifier over the dataset. Explain your algorithm and its configuration. Explain your findings into both numerical and graphical representations. Evaluate the performance of the model and verify the accuracy and the effectiveness of your model.

b) Apply a multi-class classifier to classify data into ten classes (categories): one normal and nine attacks (e.g., Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode and Worms). Briefly explain your model with supportive statements on its parameters, accuracy and effectiveness.

(4) Individual Assessment
Discuss (1) what other alternative technologies are available for tasks 2 and 3 and how they are differ (use academic references), and (2) what was surprisingly new thinking evoked and/or neglected at your end?
Tip: add individual assessment of each member in a same report.

(5) Documentation
Document all your work. Your final report must follow 5 sections detailed in the "format of final submission" section (refer to the next page). Your work must demonstrate appropriate understanding of academic writing and integrity.

Attachment:- Big Data Analytics.rar

Reference no: EM133063106

Questions Cloud

What is the estimated number of units : Your company, which specializes in porcine hygiene products (HogWash®), has the following demand function:
Calculate the price of these bonds : Calculate the price of these bonds, if they paid no coupons to investors. Assume semi-annual compounding for these zero-coupon bonds
What are the absolute risk reduction : Question 1: What are the absolute risk reduction (ARR), relative risk reduction (RRR), and number needed to treat (NNT) for the drug?
Calculate the range of the possible relative prices : (b) Please calculate the range of the possible relative prices (pa/pe) for the two kinds of goods.
UEL-CN-7031 Big Data Analytics Assignment : UEL-CN-7031 Big Data Analytics Assignment Help and Solution, University of East London - Assessment Writing Service
Explain the meaning of the product life cycle : Explain the meaning of the product life cycle in keeping companies competitive in international trade.
Make market participation decisions : Apply basics of supply and demand analysis to identify how consumers and producers make market participation decisions.
Determine the demand function : a. Run OLS to determine the demand function as P = f(Q); how much confidence do you have in this estimated equation? Use algebra to invert the demand function t
How might discrepancies be explained : 1. Fleury and Barry both argue that many of the methods used to contain or control the spread of the disease, measures such as quarantining, mask-wearing

Reviews

Write a Review

Basic Computer Science Questions & Answers

  Identifies the cost of computer

identifies the cost of computer components to configure a computer system (including all peripheral devices where needed) for use in one of the following four situations:

  Input devices

Compare how the gestures data is generated and represented for interpretation in each of the following input devices. In your comparison, consider the data formats (radio waves, electrical signal, sound, etc.), device drivers, operating systems suppo..

  Cores on computer systems

Assignment : Cores on Computer Systems:  Differentiate between multiprocessor systems and many-core systems in terms of power efficiency, cost benefit analysis, instructions processing efficiency, and packaging form factors.

  Prepare an annual budget in an excel spreadsheet

Prepare working solutions in Excel that will manage the annual budget

  Write a research paper in relation to a software design

Research paper in relation to a Software Design related topic

  Describe the forest, domain, ou, and trust configuration

Describe the forest, domain, OU, and trust configuration for Bluesky. Include a chart or diagram of the current configuration. Currently Bluesky has a single domain and default OU structure.

  Construct a truth table for the boolean expression

Construct a truth table for the Boolean expressions ABC + A'B'C' ABC + AB'C' + A'B'C' A(BC' + B'C)

  Evaluate the cost of materials

Evaluate the cost of materials

  The marie simulator

Depending on how comfortable you are with using the MARIE simulator after reading

  What is the main advantage of using master pages

What is the main advantage of using master pages. Explain the purpose and advantage of using styles.

  Describe the three fundamental models of distributed systems

Explain the two approaches to packet delivery by the network layer in Distributed Systems. Describe the three fundamental models of Distributed Systems

  Distinguish between caching and buffering

Distinguish between caching and buffering The failure model defines the ways in which failure may occur in order to provide an understanding of the effects of failure. Give one type of failure with a brief description of the failure

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd