Affordable Access

Detecting temporal and spatial anomalies in users' activities for security provisioning in computer networks

  • Huč, Aleks
Publication Date
Jun 22, 2022
University of Ljubljana
External links


Communication is essential to humans as social beings - it enables us to build and maintain relationships, take part in education, work and act in other private and public social environments settings. Nowadays, more and more communication is occurring through computers, computer networks and other digital devices. Unfortunately, as with many things in our lives, this communication can be compromised and exploited by attackers for their monetary gain, social status or curiosity at the expense of legitimate users. Therefore, the need for robust, reliable and rapid detection and prevention of network security threats has become very important. The field of computer network security is very broad, which is why we focused on intrusion detection in computer networks. Over the years two main techniques of intrusion detection were developed: anomaly-based detection and signature-based detection. Anomaly-based detection builds a normal network activity model and focuses on detecting abnormal network activity that differs from the model. Signature-based detection includes a knowledge database of signatures of known attacks and focuses on detecting network activity that conforms to stored signatures. Many different intrusion detection ap-proach-es have already been developed, however, networks with ever growing volume, velocity, variety and variability of transmitted data pose an open challenge. Specifically, how to identify new types of attacks, effectively analyze large amounts of data, learn from unlabeled data, adapt to changes in data and improve robustness and accuracy of detection. The goal of this dissertation is to build upon the current state-of-the-art computer network anomaly detection approaches. We explore lightweight, unsupervised and incremental approaches that can handle a large volume of data, adapt to non-stationary changes automatically and do not need prerequisite training on labeled data. We propose two new approaches for detecting anomalies in computer networks, which for their input, instead of packet headers and payloads, use network packet aggregates (so called network flows) that greatly reduce the volume of the data that needs to be analyzed. For every network entity we build a profile that models its activity with an incremental hierarchical clustering algorithm based on BIRCH clustering that automatically updates to changes in the input data with the help of a fading function. The first approach detects anomalies inside profiles by tracking cluster changes over time with the ADWIN algorithm, distance of the new clustered observation from the cluster center, distance of the new cluster from its neighboring cluster and by tracking the size and age of the cluster. The second approach adds an additional level with incremental hierarchical clustering that groups together similar profiles and detects anomalies in activity of those groups with mechanisms presented in the first approach. Second level clustering analyzes tree data structures, which is why we defined a new metric for determining similarities between them on the basis of distances between clusters and size of clusters. In our analysis we have used up to date data sets of network flows (ISCXIDS2012 and CICIDS2017) with the most common types of network attacks. We have evaluated prediction performance, execution time, feature importance and performed sensitivity analysis of the most important parameters. Both approaches achieved prediction performance (F1 score over 0.90) comparable to the state-of-the-art supervised approaches even when taking into account that they see every data point only once and then discard it without the prerequisite learning phase with labeled data. The two approaches present a good baseline for further improvement of detection performance with additional detection mechanisms. They can provide data reduction and a pre-processing step for computationally more demanding methods. They are also of a general nature, which is why they can also be used in other problem domains that can be presented as data streams.

Report this publication


Seen <100 times