This paper demonstrates how different machine learning techniques performed on a recent, partially labeled dataset (based on the Locked Shields 2017 exercise) and which features were deemed important. Moreover, a cybersecurity expert analyzed the results and validated that the models were able to classify the known intrusions as malicious and that they discovered new attacks. In a set of 500 detected anomalies, 50 previously unknown intrusions were found. Given that such observations are uncommon, this indicates how well an unlabeled dataset can be used to construct and to evaluate a network intrusion detection system.

doi.org/10.1109/WI.2018.00017
Workshop on Data Science for Crime Analytics
Stochastics

Klein, J., Bhulai, S., Hoogendoorn, M., van der Mei, R., & Hinfelaar, R. (2018). Detecting network intrusion beyond 1999: Applying machine learning techniques to a partially labeled cybersecurity dataset. In Proceedings workshop on Data Science for Crime Analytics (pp. 784–787). doi:10.1109/WI.2018.00017