Reference no: EM132868515
Task
This assessment builds on the work you have completed and the skills you have developed in the tutorials during the course. You have a variety of honeypot data sources to consider for your assessment
- coming from both a network of Cowrie sensors and a selection zero interaction Network Telescopes. You need to consider both data sets and to make a choice fairly quickly. Your options are to complete a detailed analysis over a smaller set of data or to use a larger dataset and perform a more high-level analysis
Datasets
Two datasets are provided for this task. You need to choose one Clearly state in your report which one is being used, and why you chose it. Details of the data provided are contained in the appendix at the end of this document.
1. Cowrie honeypot datasets
Two months of data are provided (April and June 2018). These are taken froma distributed network of Cowrie sensors run globally across AWS zones. Data _les are provided in JSON format.
2. Network Telescope datasets(CHOOSE THIS ONE)
One week of data (17-23 April 2021) is provided across five separate sensors. Combined, these
datasets contain around 35 million recorded events.
Analysis(Describe what commands you used to find these and what you found. Use graphs where this is appropriate.)
You need to perform appropriate analysis on your chosen data. As a minimum (at best this can score 50%) this analysis needs to include:
• 7 Days worth of traffic (7x 24 hour periods)
• Basic descriptive statistics as appropriate
These would include: counts of various data types; averages observed values over a period of time
and anything else that you feel helps describe the data to the reader.
• An analysis of the top 20 sources.
• An analysis of the top 20 destination ports (in the case of the network telescope data).
• An analysis of the variation of IP TTL values (in the case of the network telescope data).
• An analysis of the top 30 passwords and usernames (Cowrie data).
• Some basic enrichment of the data (Geolocation is likely the easiest).
• Use of at least one Threat Intelligence (TI) source for specific enrichment other than the Greynoisedata provided in the tutorials.
For additional threat feeds for enrichment, you are encouraged to look at the multitude
of sources provided by threatfeeds.io and other resources discussed in lectures.
Extend the analysis above to include items such as the following:
• A comparison of multiple time periods, possible across months (Cowrie Data).
• A comparison of the same period across multiple sensors (Telescope Data).
• Time-series analysis (and plot) of traffic - by sources/ports/volume/usernames (as appropriate).
• Additional Enrichment (and analysis) of Source addresses.
• Use of more than two TI sources.
• Explore your data.
Setup of the report and questions to answer(These sections 1-5 with names need to be listed for each section and fill in the answers)
1. Define your dataset
How was it extracted, what data ranges were used, why this was selected?
Clarification of data source/systems. Statement of any assumptions you have made around the data.
2. Description of dataset
This would include a discussion of how many incidents/events/uniquesources observed, volume of traffic, overall trends and other content relating to the data.
3. An overview of your analysis process
This explains your analysis process and how you distilled and explored your data. One approachcould be a description of scripts, datafiles, command lines etc. This should provide sufficient detail to allow someone to replicate your analysis. An important element here is what is the logicalprocess you followed.
Note: There is a separate submission link for uploading code/scripts/notes.
4. Findings
What are the basic results obtained, how do they relate to the data, and possibly to other research. What did you find that was interesting/unexpected? How do you interpret the results you have obtained? This is intended to be more of a discussion than just a description of raw results. In your findings clearly indicate how the data processed could be used to improve an organizations security.
5. References
5-10 references which relate to the analysis (these do not need to all be journals.
Attachment:- Task - Data Analysis.rar