Title: Risk Score data for Cyberattacks

Javed A, Lakoju M, Burnap P, et al. (2020). Risk Score data for Cyberattacks. Cardiff University. http://doi.org/10.17035/d.2020.0102115178

Access Rights: Data is provided under a Creative Commons Attribution (CC BY 4.0) licence
Access Method: Click to email a request for this data to opendata@cardiff.ac.uk

Cardiff University Dataset Creators

Dataset Details
Publisher: Cardiff University
Date (year) of data becoming publicly available: 2020
Data format: .xlsx
Estimated total storage size of dataset: Less than 100 megabytes
Number of Files In Dataset: 4
DOI: 10.17035/d.2020.0102115178


Our (measurement-based) study is based on malicious network traffic observed by the Palo Alto Networks' Wildfire system. The log files containing malicious traffic instances of 144 consecutive hours were preprocessed to extract information about threats, their categories, severity levels, occurrence time and the targeted software applications, which were then grouped on an hourly basis in terms of threat occurrence time. The collected malicious traffic data has more than 400,000 instances with 278 unique threats targeting 90 different software applications with 5 distinct severity levels (informational, low, medium, high and critical).

We considered the two most frequently occurring threats during the considered time period, which formed 95.67% of the total observed threats. These threats were: i) MS-RDP Brute-Force Attempt (RDP) - which targets Microsoft's Remote Desktop Protocol to remotely access windows servers and desktops by trying several commonly used username and passwords, thus, following a brute-force method to gain unauthorized access to remote systems. MS-RDP is widely used in many Cloud-based deployments to provide remote desktop access to users, based on servers hosted within a data centre; ii)  Android Package File (Android) - distributed using a drive-by download mechanism and targeting Android-based devices, performing the malware-based actions on the device and leaving them more vulnerable to further sophisticated attacks;

Having identified the two most frequently occurring threats we have associated a time unit t with the Risk Score to calculate risk score at a given time instance. The time granularity for this study has been considered as an hour. We calculate Risk Score, associated with a particular threat, λ at time t represented as Riskλ(t) using the following equation:


The resultant dataset contains three columns, the date, hour and risk score for the threat.

Research results based upon these data are published at http://doi.org/10.1002/spe.2822

Related Projects

Last updated on 2020-13-07 at 08:48