Reference no: EM133137417
COMP8325 Applications of Artificial Intelligence for Cyber Security - Macquarie University
Assignment Description
Learning Outcome 1: Explain the basic concepts and the limitations of Artificial Intelligence;
Learning Outcome 2: Detect intrusion in networks and systems by applying tools and techniques revealing abnormal patterns in datasets; and
Learning Outcome 3: Analyse the trends of applications of Artificial Intelligence in cyber security.
TASK 1: Merits of Entropy in Attack Detection/Diagnostics
Consider a server-log dataset hosted at Google Drive Two attacks happened on a day, both somewhere around 8am and noon. Please answer the following questions:
• Identify the exact date and time1. What approach did the attackers use? (marks 2)
There has been significant literature23 discussing how entropy can be used to detect these attacks. To do it effectively, approximation schemes are usually used. You do not have to implement these approximation techniques, but do present an analysis of whether entropy is useful and which com- binations you tried, e.g. src ip, dest ip, src-port, dst-port, etc. Do any reveal anomalies when the two attacks happen? (marks 2)
TASK 2: Web Tracking
A typical webpage consists of several web-components, e.g., JavaScript codes, Flash-content, images, CSS, etc. When a user opens a website in a web browser, the fetched webpage typically generates several other HTTP(S) connections for downloading additional components of the webpage. These components can be downloaded from the website visited by the user (referred to as first-party domain) or downloaded from other third-party domains. Here, we focus on one type of web-component, namely JavaScript codes, which is loaded both from first- and third-party domains. JavaScript programs are widely used by ad networks, content distribution networks (CDNs), tracking services, analytics platforms, and online social networks (e.g., Facebook uses them to implement plugins)
Figure1illustrates a typical scenario of web tracking via JavaScript codes. Upon fetching a webpage from first-party domains (steps 1 & 2), the user's web browser interprets the HTML tags and executes JavaScript programs within the HTML script tags. JavaScript code execution enables the web browser to send requests to retrieve additional content from third-party domains (step 3). Depending on the implemented function- alities, the JavaScript programs can be considered as useful (functional), e.g., fetching content from a CDN, or as tracking. In the latter case, when the webpage is completely rendered (step 4), the JavaScript codes track user's activities on the webpage, write to or read from the cookie database (steps 5 & 6), or reconstruct user identifiers. Tracking JavaScript programs may also be used to fingerprint user's browser (as well as system) and to transfer private and sensitive information to third-party domains (step 7).
Now, imagine your are given a task to develop a machine-learning based on only class(i.e., One Class SVM or Positive Unlabelled (PU) Learning, see ref4) technique to differentiate tracking JavaScript codes from functional ones. To this end you are provided with labelled dataset (see COMP8325's iLearn page) containing labelled functional and tracking JavaScript codes. Here may use the code provided at iLearn to do the following tasks.
Use Term Frequency - Inverse Document Frequency (TF-IDF) to extract features from functional and tracking JavaScript codes.
Develop either One-Class SVM or PU Learning, and a baseline SVM for comparison, to classify the JavaScript codes.
Design and conduct experiments to validate and test the efficacy of your developed model:
- To report any over- or under-fitting of the models, you may use 60% of the data for testing, 20% for validation, and 20% for the testing.
- Report and discuss the parameters of OCSVM or PU Learning model which give your improved results.
Attachment:- Applications of Artificial Intelligence.rar