Title: Behavioural machine activity for benign and malicious Win7 64-bit executables
Citation
Burnap P, Rhode M (2018). Behavioural machine activity for benign and malicious Win7 64-bit executables. Cardiff University. https://doi.org/10.17035/d.2018.0050524986
Access Rights: Creative Commons Attribution 4.0 International
Access Method: Click to email a request for this data to opendata@cardiff.ac.uk
Cardiff University Dataset Creators
Dataset Details
Publisher: Cardiff University
Date (year) of data becoming publicly available: 2018
Data format: CSV
Estimated total storage size of dataset: Less than 100 megabytes
Number of Files In Dataset: 2
DOI : 10.17035/d.2018.0050524986
DOI URL: http://doi.org/10.17035/d.2018.0050524986
Related URL: https://www.sciencedirect.com/science/article/pii/S0167404818305546
Description
The two datasets here record behavioural activity for malicious and benign executable files capable of running on a Windows 7 operating system.
Dataset 1:
- filename = "data_1.csv"
- 594 benign samples
- 595 malicious samples
- Up to 305 seconds (5:05 min) execution per file
- The data was collected in a VirtualBox[1] virtual machine using Cuckoo Sandbox[2] with a custom package written in the Java library, Sigar[3] to collect the machine activity data.
- The virtual machine used 2GB RAM, 25 GB storage, and a single CPU core running 64-bit Windows 7.
Dataset 2:
- filename = "data_2.csv"
- 2345 benign samples
- 2286 malicious samples
- Up to 20 seconds execution per file
- The data was collected in a VirtualBox[1] virtual machine using Cuckoo Sandbox[2] with a custom package written in the python library, Psutil[4] to collect the machine activity data.
- The virtual machine used 8GB RAM, 25 GB storage, and a single CPU core running 64-bit Windows 7.
Columns
- sample_id: an identifier value for the samples (categorical)
- vector: time in seconds since start of file execution (numeric)
- malware: class label 0=benign, 1=malicious (categorical)
- cpu_sysem: percentage of cpu being used to run programs in system kernel (numeric)
- cpu_user: percentage of cpu being used to run programs in user space (numeric)
- memory: bytes currently being used in memory (numeric)
- swap: bytes currently being used in swap memory (numeric)
- total_pro: total number of processes running (numeric)
- max_pid: maximum process id held by a process (numeric)
- rx_bytes: number of bytes being received (numeric)
- tx_bytes: number of bytes being sent (numeric)
- rx_packets: number of packets being received (numeric)
- tx_packets: number of packets being sent (numeric)
- test_set: True=sample belongs to test set, False=sample belongs to training set
Dataset 2 only:
- family: malware type - value missing if unknown or benign (categorical)
- variant: malware variant - value missing unknown or benign (categorical)
- test-set: file was first seen before October (categorical)
[1] https://www.virtualbox.org/wiki/Downloads
[2] https://cuckoosandbox.org/
[3] https://github.com/hyperic/sigar
[4] https://pypi.org/project/psutil/
Research results based upon these data are published at http://doi.org/10.1016/j.cose.2018.05.010