Design and build a binary classifier over the dataset

Assignment Help Other Subject
Reference no: EM133106797

UEL-CN-7031 Big Data Analytics - University of East London

Tasks:

(1) Understanding Dataset: UNSW-NB15

The raw network packets of the UNSW-NB15 dataset was created by the IXIA PerfectStorm tool in the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) for generating a hybrid of real modern normal activities and synthetic contemporary attack behaviours.

Tcpdump tool used to capture 100 GB of the raw traffic (e.g., Pcap files). This data set has nine types of attacks, namely, Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode and Worms. The Argus and Bro-IDS tools are used and twelve algorithms are developed to generate totally 49 features with the class label.

a) The features are described here.
b) The number of attacks and their sub-categories is described here.
c) In this coursework, we use the total number of 10-million records that was stored in the CSV file (download). The total size is about 600MB, which is big enough to employ big data methodologies for analytics. As a big data specialist, firstly, we would like to read and understand its features, then apply modeling techniques. If you want to see a few records of this dataset, you can import it into Hadoop HDFS, then make a Hive query for printing the first 5-10 records for your understanding.

(2) Big Data Query & Analysis by Apache Hive

This task is using Apache Hive for converting big raw data into useful information for the end users. To do so, firstly understand the dataset carefully. Then, make at least 4 Hive queries (refer to the marking scheme). Apply appropriate visualization tools to present your findings numerically and graphically. Interpret shortly your findings.

Finally, take screenshot of your outcomes (e.g., tables and plots) together with the scripts/queries into the report.

(3) Advanced Analytics using PySpark
In this section, you will conduct advanced analytics using PySpark.

Analyze and Interpret Big Data

We need to learn and understand the data through at least 4 analytical methods (descriptive statistics, correlation, hypothesis testing, density estimation, etc.). You need to present your work numerically and graphically. Apply tooltip text, legend, title, X-Y labels etc. accordingly to help end-users for getting insights.

Design and Build a Classifier

a) Design and build a binary classifier over the dataset. Explain your algorithm and its configuration. Explain your findings into both numerical and graphical representations. Evaluate the performance of the model and verify the accuracy and the effectiveness of your model.

b) Apply a multi-class classifier to classify data into ten classes (categories): one normal and nine attacks (e.g., Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode and Worms). Briefly explain your model with supportive statements on its parameters, accuracy and effectiveness.

Assessment
Discuss (1) what other alternative technologies are available for tasks 2 and 3 and how they are differ (use academic references), and (2) what was surprisingly new thinking evoked and/or neglected at your end?

Attachment:- Big Data Analytics.rar

Reference no: EM133106797

Questions Cloud

Explain the bayer history of operations management : Explain the Bayer's history of operations management. What type of business operations led or contributed to the breakdown of Monsanto
What is the forecast of the exchange rate one year from now : Suppose the interest rate in Singapore is 4% and the interest rate in Australia is 5%. What is the forecast of the exchange rate one year from now
How much does she need to invest today in a saving account : Sarah needs $150,000 as a deposit on a house in 5 years. How much does she need to invest today in a saving account paying 5 per cent per annum
What is the effect of this error on the accounting equation : The payment of $20,000 for expenses was incorrectly recorded by Elite Co. as an increase in cash of $20,000. What is effect of this error on accounting equation
Design and build a binary classifier over the dataset : Design and build a binary classifier over the dataset. Explain your algorithm and its configuration. Explain your findings into both numerical and graphical
What are ideas about the undo and redo features in excel : What are ideas about the Undo and Redo features in excel? Give an example when this would be helpful?
Calculate the total expected collections from customers : Credit sales are normally settled: 50% in the month of the sale, 40% in the month after the sale, Calculate the total expected collections from customers
Explain and provide an example of an asset and a liability : Question - Explain and provide an example of an asset, a liability and how these terms are used in a healthcare organization
What is personal financial planning : What is personal financial planning? What are some of the benefits a client receives from choosing to use a professional financial planner

Reviews

Write a Review

Other Subject Questions & Answers

  Forensic psychologists that work for the prisons

Do you think it is harder for forensic psychologists that work for the prisons because of all the extra duties they are expected to perform that are in contrevention to normal psychology duties

  The importance of your study on dm processes in health care

Discuss key findings of the study and their impact on the healthcare industry (specify areas, populations, policies impacted and how).

  Association analysis or cluster analysis technique

Define a data mining problem that can be solved using an Association Analysis or Cluster Analysis technique.

  Why are you interested in graduate school

Why are you interested in graduate school? What are your plans after you receive your degree?

  Closed stratification system

Which of the following is not a closed stratification system

  What is greenhouse effect-what law of physics can we use

What is the greenhouse effect and what effect does it have on the earth's surface temperature? What would the average surface temperature be without the GH effect? How can we determine what the temperature should have been (what law of physics can we..

  What is the interpretation of components of the cluster mean

Consider the mean of a cluster of objects from a binary transaction data set. What are the minimum and maximum values of the components of the mean?

  Compare possible components of background checks

Write a 650- to 700-word paper in which you complete the following: Compare at least three possible components of background checks.

  What is the difference between variable and constant

Explain the arithmetic, relational, logical and assignment operators in C language. What is the difference between variable and constant?

  Discuss what are some key points in business environment

Compare the video to our textbook. You may pick one of the topics covered in the video: sales contract, lease contract, performance, or breach

  Marketing for online sales in developing economies

The subject for my marketing plan topic is "marketing for online sales in the developing economies." In some families, the consumer is not willing to purchase products in store as all the members of the family are working and no member of the family ..

  Pros and cons of affirmative action

Workout the pros and cons of 'Affirmative Action' with respect to our society. How can it play a constructive role and where it goes discriminatory.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd