Teitl: Behavioural machine activity for benign and malicious Win7 64-bit executables

Dyfyniad
Burnap P, Rhode M (2018). Behavioural machine activity for benign and malicious Win7 64-bit executables. Cardiff University. http://doi.org/10.17035/d.2018.0050524986


Hawliau Mynediad: Gall data fod ar gael yn rhad ac am ddim yn amodol ar briodoli
Dull Mynediad: I anfon cais i gael y data hwn, ebostiwch opendata@caerdydd.ac.uk

Crewyr y Set Ddata o Brifysgol Caerdydd

Manylion y Set Ddata
Cyhoeddwr: Cardiff University
Dyddiad (y flwyddyn) pryd y daeth y data ar gael i'r cyhoedd: 2018
Fformat y data: CSV
Amcangyfrif o gyfanswm maint storio'r set ddata: Llai na 100 megabeit
Nifer y ffeiliau yn y set ddata: 2
DOI: 10.17035/d.2018.0050524986

Disgrifiad

The two datasets here record behavioural activity for malicious and benign executable files capable of running on a Windows 7 operating system. 

Dataset 1:

  • filename = "data_1.csv"
  • 594 benign samples 
  • 595 malicious samples
  • Up to 305 seconds (5:05 min) execution per file
  • The data was collected in a VirtualBox[1] virtual machine using Cuckoo Sandbox[2] with a custom package written in the Java library, Sigar[3] to collect the machine activity data. 
  • The virtual machine used 2GB RAM, 25 GB storage, and a single CPU core running 64-bit Windows 7.


Dataset 2:

  • filename = "data_2.csv"
  • 2345 benign samples 
  • 2286 malicious samples
  • Up to 20 seconds execution per file
  • The data was collected in a VirtualBox[1] virtual machine using Cuckoo Sandbox[2] with a custom package written in the python library, Psutil[4] to collect the machine activity data. 
  • The virtual machine used 8GB RAM, 25 GB storage, and a single CPU core running 64-bit Windows 7.



Columns


  • sample_id: an identifier value for the samples (categorical)
  • vector: time in seconds since start of file execution (numeric)
  • malware: class label 0=benign, 1=malicious (categorical)
  • cpu_sysem: percentage of cpu being used to run programs in system kernel (numeric)
  • cpu_user: percentage of cpu being used to run programs in user space (numeric)
  • memory: bytes currently being used in memory (numeric)
  • swap: bytes currently being used in swap memory (numeric)
  • total_pro: total number of processes running (numeric)
  • max_pid: maximum process id held by a process (numeric)
  • rx_bytes: number of bytes being received (numeric)
  • tx_bytes: number of bytes being sent (numeric)
  • rx_packets: number of packets being received (numeric) 
  • tx_packets: number of packets being sent (numeric)
  • test_set: True=sample belongs to test set, False=sample belongs to training set



Dataset 2 only:

  • family: malware type - value missing if unknown or benign (categorical)
  • variant: malware variant - value missing unknown or benign (categorical)
  • test-set: file was first seen before October (categorical)


[1] https://www.virtualbox.org/wiki/Downloads

[2] https://cuckoosandbox.org/ 

[3] https://github.com/hyperic/sigar 

[4] https://pypi.org/project/psutil/

Research results based upon these data are published at http://doi.org/10.1016/j.cose.2018.05.010



Prosiectau Cysylltiedig

Diweddarwyd y tro diwethaf ar 2020-20-05 am 15:03