Constructing and Evading network traffic based model of IDS

Assignment Help Computer Engineering
Reference no: EM132160358

Project - ML for Security

Constructing & Evading network traffic based model of IDS

Introduction: The goal of this project is to introduce students to machine learning techniques and methodologies, that help to differentiate between malicious and legitimate network traffic. In summary, the students are introduced to:

  • Using a machine learning based approach to create a model that learns normal network traffic.
  • Learning how to blend attack traffic, so that it resembles normal network traffic, and bypass the learned model.

Task A -

Preliminary reading: Please refer to the above readings to learn about how the PAYL model works: a) how to extract byte frequency from the data, b) how to train the model, and c) the definition of the parameters; threshold and smoothing factor.

Code and data provided: Please look at the PAYL directory, where we provide the PAYL code and data to train the model.

Install packages needed: Please read the file SETUP.txt under PAYL directory to install packages that are needed for the code to run.

PAYL Code workflow: Here is the workflow of the provided PAYL code:

- It operates in two modes: a) training mode: It reads in pcap files provided in the 'data' directory, and it tests parameters and reports True Positive rates, and b) testing mode: It trains a model using specific parameters and using data in the directory, it will use a specific packet to test and then will decide if the packet fits the model.

- Training mode: It reads in the normal data and separates it into training and testing. 75% of the provided normal data is for training and 25% of the normal data is for testing. (NOTE: You will NOT change these portions in the code.) It sorts the payload strings by length and generates a model for each length. Each model per length is based on [mean frequency of each ascii, standard deviation of frequencies for each ascii].

  • To run PAYL on training mode: $ python wrapper.py

- Testing mode: It reads in normal data from directory, it trains a model using specific parameters, and it tests the specific packet (fed from command line) against the trained model.

  • It computes the mahalanobis distance between each test payload and the model (of the same length)
  • It labels the payload: If the mahalanobis distance is below the threshold, then it accept the payload as normal traffic. Otherwise, it rejects the packet as attack traffic
  • To run PAYL on testing mode: $ python wrapper.py [FILE.pcap] FILE.pcap is the data you will test.

Tasks: Perform experiments to select proper parameters.

  • You are provided a single traffic trace (artificial-payload) to train a PAYL model.
  • After reading the reference papers above, it should make sense that you cannot train the PAYL model on the entire traffic because it contains several protocols. Select a protocol: a) HTTP or b) DNS to train PAYL. The way you select is that you change the hard-coded option in the wrapper.py file.
  • Use the artificial traffic corresponding to the protocol that you have chosen and proceed to train PAYL. Use the provided code in the training mode and make sure that you are going to use the normal traffic(artificial payload) that is fed to your code while training. Provide a range of the two parameters (threshold and smoothing factor). For each pair of parameters you will observe a True Positive Rate. Select a pair of parameters that gives 96% or more True Positive; more than 99% true positive rate is possible. You may find multiple pairs of parameters that can achieve that.
  • You will find mSF and mTMD values which make mTP>96% for both HTTP and DNS protocols.

Task B -

Download your unique attack payload: To download your unique attack payload and replace "YOUR_GTID" with your GTID (e.g., gcetin3). NOTE: Do NOT forget to put ".pcap" after YOUR_GTID.

Use PAYL in testing mode. You will first test your unique attack payload for both HTTP and DNS protocols ( NOTE: Do NOT forget to change Smoothing Factor and Threshold for Mahalanobis Distance when you change the protocol.).

Verify that your attack traces get rejected for both protocols. By rejected, we mean that you will get the "It doesn't fit the model" message on your test screen as presented following figure.

Finally, try the artificial payloads. We provide two artificial payloads; one for HTTP (http_artificial_profile.pcap) and one for DNS (dns_artificial_profile.pcap). Both are in PAYL folder. Test each artificial payload against your model. That is, use testing mode as explained above by giving each artificial payload as parameter. (NOTE: Do NOT forget to change parameters according to each protocol while testing relevant payload, e.g., DNS parameters to test dns_artificial_profile.pcap.) These packets should be accepted by the individual model. That is, you should get an output message that says "It fits the model" as presented following figure.

TASK C -

Preliminary reading. Please refer to the "Polymorphic Blending Attacks" paper. In particular, section 4.2 that describes how to evade 1-gram and the model implementation. More specifically we are focusing on the case where m <= n and the substitution is ONE-TO-MANY.

We assume that the attacker has a specific payload (attack payload) that she would like to blend in with the normal traffic. Also, we assume that the attacker has access to one packet (artificial profile payload) that is normal and is accepted as normal by the PAYL model.

The attacker's goal is to transform the byte frequency of the attack traffic so that is matches the byte frequency of the normal traffic, and thus bypass the PAYL model.

Deliverables -

Task A: Please report for each protocol that you used and the parameters that you found in a file named parameters.txt. Please report a decimal with 2 digit accuracy for each parameter.

Task B: Please report your calculated distance (mDISTANCE in above figures) in parameters.txt for each protocol with the values of the attack payload after completing Task B.

Task C: Code: 40 points. Please submit your code files substitution.py and padding.py, and your substitution_table.txt.

Please submit your output of Task C generated as a new file after running task1.py.

Note - All required figures are in attached file.

Attachment:- Assignment Files.rar

Verified Expert

Machine learning based approach to create a model that learns normal network traffic Learning how to blend attack traffic, so that it resembles normal network traffic, and bypass the learned model with R and python we can implement the above said approach.

Reference no: EM132160358

Questions Cloud

What was its lerner? index : What was its Lerner? Index? If Apple is a? short-run profit-maximizing? monopoly, what elasticity of demand did Apple believe it? faced?
Are all payments by pharmaceutical companies to doctors : What will patients gain through the ability to access information regarding their physicians' financial relationships with pharmaceutical companies?
What is the expansion eac : A new airport expansion will cost $850 million. Of the total, $550 million is for land acquisition and major earthworks, which will last as long as the airport.
Calculate deadweight loss if the monopoly : Calculate deadweight loss if the monopoly charges the? profit-maximizing price. Deadweight loss equals __ ?(Enter your response rounded to two decimal? places.
Constructing and Evading network traffic based model of IDS : Project - ML for Security. Constructing & Evading network traffic based model of IDS. Create a model that learns normal network traffic
Determine the eac if the car is sold after five years : A new automobile costs $28,000. The car's value will decrease 15% the first year and 10% each year thereafter. The maintenance costs are expected to be $200.
How much is owed : A new car is purchased for $10,000 with a 0% down, 9% interest rate loan. The loan's length is 4 years. After making 30 monthly payments.
Find what is the monthly payment : For an $85,000 mortgage with a 30-year term and a 12% nominal interest rate, what is the monthly payment? After the first year of payments.
Ref monopoly-resource allocation : What is the relevance of studying Harberger and what is the goal of his above essay?

Reviews

len2160358

11/5/2018 3:16:41 AM

NOTE: To work on this project, we recommend you to use Linux OS. However, in the past, students faced no difficulty while working on this project even on Windows or Macintosh OS. NOTE: You can set lower and upper bound values of both parameters in wrapper.py as the values you found in training mode to avoid multiple iteration during testing mode.

len2160358

11/5/2018 3:16:35 AM

Deliverables & Rubric - Task A: 35 points. Please report for each protocol that you used and the parameters that you found in a file named parameters.txt. Please report a decimal with 2 digit accuracy for each parameter. Task B: 5 points. Please report your calculated distance (mDISTANCE in above figures) in parameters.txt for each protocol with the values of the attack payload after completing Task B. NOTE: Your are given a sample parameters.txt with dummy values under PAYL directory. Please update each value with your own answer.

len2160358

11/5/2018 3:16:29 AM

Task C: 60 points - Code: 40 points. Please submit your code files substitution.py(20 points) and padding.py(10 points), and your substitution_table.txt(10 points). Output: 20 points. Please submit your output of Task C generated as a new file after running task1.py. NOTE!!!: Every file name with wrong name and/or extension will be penalized with -5 points. NOTE!!!: Do NOT zip your deliverable files. You will also lose 5 points for zipped files. Please don’t procrastinate completing this project. Good luck for your finals!

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd