One of the reasons that the deployment of network intrusion detection methods falls short is the lack of realistic labeled datasets, which makes it challenging to develop and compare techniques. It is caused by the large amounts of effort that it takes for a cyber expert to classify network connections. This has raised the need for methods that learn from both labeled and unlabeled data which observations are best to present to the human expert. Hence, Active Learning (AL) methods are of interest. In this paper, we propose a new hybrid AL method called Jasmine. Firstly, it uses the uncertainty score and anomaly score to determine how suitable each observation is for querying, i.e., how likely it is to enhance classification. Secondly, Jasmine introduces dynamic updating. This allows the model to adjust the balance between querying uncertain, anomalous and randomly selected observations. To this end, Jasmine is able to learn the best query strategy during the labeling process. This is in contrast to the other AL methods in cybersecurity that all have static, predetermined query functions. We show that dynamic updating, and therefore Jasmine, is able to consistently obtain good and more robust results than querying only uncertainties, only anomalies or a fixed combination of the two.

, , , ,
Machine Learning with Applications

Klein, J.G, Bhulai, S, Hoogendoorn, M, & van der Mei, R.D. (2022). Jasmine: A new Active Learning approach to combat cybercrime. Machine Learning with Applications, 9, 100351.1–100351.15. doi:10.1016/j.mlwa.2022.100351