CN7022 Big Data Analytics Assignment Problem

Assignment Help Other Subject
Reference no: EM132418656

CN7022 Big Data Analytics Assignment - University of East London, UK

This coursework is divided into two sections: (1) Big Data analytics on a real case study and (2) group presentation.

Big Data Analytics using Hadoop and Spark

Tasks:

(1) Understanding Dataset: UNSW-NB15

The raw network packets of the UNSW-NB151 dataset was created by the IXIA PerfectStorm tool in the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) for generating a hybrid of real modern normal activities and synthetic contemporary attack behaviours. Tcpdump tool used to capture 100 GB of the raw traffic (e.g., Pcap files). This data set has nine types of attacks, namely, Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode and Worms. The Argus and Bro-IDS tools are used and twelve algorithms are developed to generate totally 49 features with the class label.

a) The features are described here.

b) The number of attacks and their sub-categories is described here.

c) In this coursework, we use the total number of 700K records that was stored in the CSV file (download). The total size is about 600MB, which is big enough to employ big data methodologies for analytics. As a big data specialist, firstly, we would like to read and understand its features, then apply modeling techniques. If you want to see a few records of this dataset, you can import it into Hadoop HDFS, then make a Hive query for printing the first 5-10 records for your understanding.

(2) Big Data Query & Analysis by Apache Hive

This task is using Apache Hive for converting big raw data into useful information for the end users. To do so, firstly understand the dataset carefully. Then, make at least 4 Hive queries (refer to the marking scheme). Apply appropriate visualization tools to present your findings numerically and graphically. Interpret shortly your findings.

Finally, take screenshot of your outcomes (e.g., tables and plots) together with the scripts/queries into the report.

(3) Advanced Analytics using PySpark

In this section, you will conduct advanced analytics using PySpark.

3.1. Analyze and Interpret Big Data

We need to learn and understand the data through at least 4 analytical methods (descriptive statistics, correlation, hypothesis testing, density estimation, etc.). You need to present your work numerically and graphically. Apply tooltip text, legend, title, X-Y labels etc. accordingly to help end-users for getting insights.

3.2. Design and Build a Classifier

a) Design and build a binary classifier over the dataset. Explain your algorithm and its configuration. Explain your findings into both numerical and graphical representations. Evaluate the performance of the model and verify the accuracy and the effectiveness of your model.

b) Apply a multi-class classifier to classify data into ten classes (categories): one normal and nine attacks (e.g., Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode and Worms). Briefly explain your model with supportive statements on its parameters, accuracy and effectiveness.

(4) Individual Assessment

Discuss (1) what other alternative technologies are available for tasks 2 and 3 and how they are differ (use academic references), and (2) what was surprisingly new thinking evoked and/or neglected at your end?

(5) Documentation

Document all your work. Your final report must follow 5 sections detailed in the "format of final submission" section (refer to the next page). Your work must demonstrate appropriate understanding of academic writing and integrity.

FORMAT OF FINAL SUBMISSION - You need to prepare one single file in PDF format as your group coursework within the following sections:

1. Cover Page

2. Table of Contents

3. Report of the tasks (it needs sub-sections for few tasks, accordingly)

4. Teamwork minutes (including minutes of meetings, task allocation, etc.)

5. References (if any).

Attachment:- Big Data Analytics Assignment File.rar

Reference no: EM132418656

Questions Cloud

Discuss the transfer-pricing issues : Discuss the transfer-pricing issues that both the Computer Equipment Division manager and the Consulting Division manager should consider.
Design switched mode power supply : Question - Design switched mode power supply with the following specifications: Input voltage: 10 V DC and Output ripple voltage
Calculate the net annual cost of lighting : Question: Please calculate the net annual cost of lighting. (Show your work)
Strategic marketing system assignment : Strategic Marketing System assignment help solution- Identify an EXISTING Cache Valley business for whom you want to create its 2020 Marketing Plan.
CN7022 Big Data Analytics Assignment Problem : CN7022 Big Data Analytics Assignment Help and Solution, University of East London, UK. Big Data Query & Analysis by Apache Hive
What is the financial advantage processing : 1. What is the financial advantage (disadvantage) of further processing each of the three products beyond the split-off point?
How long does it take to triple your money : How long does it take to triple your money (achieve a balance of 3 times your starting amount), if the return on your investment is 12% per year.
Various types cost the company incurs : Describe a textile company you know and defined the various types cost the company incurs, Define the various types of costs identified as period or product
Determining the combined present value : The combined present value of $50 to be received (1) one year hence; $38 (2) two years hence; and $100 (5) five years hence, with no money being received

Reviews

Write a Review

Other Subject Questions & Answers

  Specific neoclassical visual artwork

Which specific Neoclassical visual artwork (painting, sculpture or architecture) best exemplifies the characteristics of the Neoclassical style and why?

  Transportation needs significantly based on host of factors

Transportation needs vary significantly based on a host of factors. How does a planner match the ground transportation needs of an event?

  Transporter transports glucose both into-out of hepatocytes

Which one of the following is chemically least similar to the other choices? What transporter transports glucose both into and out of hepatocytes?

  The relation between heredity and suicidal behavior

Analysis of how the information/research findings in each article confirm or contradict the findings in the other articles you have chosen.

  Is this a qualitative or quantitative design and why

What is your hypothesis (both null and alternate)? Is this a qualitative or quantitative design (based on type of variable collected) and why

  Major international political event in past three years

Research paper 6 to 7 pages on a major international political event in the past three years that has impacted the United States Foreign policy. Must have these 5 parts:

  Design a new textbook for a psychology and science class

Design a new textbook for a psychology class, science class.

  Discuss the six elements of industrial systems

Discuss the six elements of industrial systems as they were developed in the late 19th century and apply them to one industry.

  Does drug use cause delinquency

Prepare a 5 page paper (excluding title and reference pages), citing at least two scholarly sources other than the textbook, which details the significance.

  State this disease type and what makes it classified as such

State this disease's type (primary, secondary, tertiary) and what makes it classified as such. What the impact of this disease on society is, as it relates to the business of healthcare and the use of resources.

  Concept learned that contribute human biological diversity

Following this quiz, explain two new concepts (questions from the quiz) you learned that contribute to human biological diversity. Make sure you include at least one point about the contribution of genetics to human diversity.

  Is there a common theme in your peer responses

Follow up posts. After your initial post, read over the responses posted by your peers and your instructor. Select at least two different posts.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd