A Taxonomy of Anomalies in Log Data
Last updated
Last updated
Copyright Continuum Labs - 2023
This 2021 paper presents a valuable contribution to the field of log data anomaly detection by introducing a taxonomy for categorising different types of anomalies found in log data.
The authors' work helps to better understand the nature of log data and provides insights that can guide researchers and IT operators in selecting appropriate anomaly detection algorithms for their specific needs.
Anomaly Taxonomy
The authors propose a taxonomy that categorises log data anomalies into two main types:
Point Anomalies and Contextual Anomalies.
Point Anomalies are further divided into Template Anomalies (characterized by the log message template) and Attribute Anomalies (described by specific words or numbers in the log message).
Contextual Anomalies, on the other hand, are determined by the surrounding log messages rather than the content of an individual log message.
The paper introduces a method for classifying anomalies in labeled datasets according to their proposed taxonomy.
This method involves tokenizing log messages, creating templates, extracting attributes, and creating contexts. By applying this method, system administrators can investigate their datasets and gain insights to help them choose suitable anomaly detection algorithms.
Analysis of Benchmark Datasets
The authors apply their classification method to three common benchmark datasets:
Thunderbird, Spirit, and BGL.
They find that the vast majority of anomalies in these datasets are Template Anomalies, with BGL also containing a significant number of Contextual Anomalies. Attribute Anomalies are found to be highly correlated with Template Anomalies in all datasets.
Evaluation of Unsupervised Learning Methods
The paper evaluates the performance of five state-of-the-art unsupervised anomaly detection methods (DeepLog, A2Log, PCA, Invariants Miner, and Isolation Forest) in detecting different types of anomalies.
The results show that Template Anomalies are the easiest to predict, explaining the good performance of template-based approaches like DeepLog.
Deep learning-based methods generally outperform data mining-based methods, particularly in detecting Contextual Anomalies.
The paper's findings help us understand log data by:
Providing a structured way to think about and categorise anomalies in log data, which can guide the selection of appropriate detection methods.
Highlighting the prevalence of different anomaly types in common benchmark datasets, giving researchers and practitioners a better understanding of the characteristics of these datasets.
Demonstrating the strengths and weaknesses of various unsupervised anomaly detection methods in detecting different types of anomalies, which can inform the choice of algorithm for a given dataset or use case.
In conclusion, this paper makes a contribution to the understanding of log data anomalies by proposing a taxonomy, introducing a classification method, analysing benchmark datasets, and evaluating the performance of unsupervised anomaly detection methods.
The insights provided can help researchers and IT operators better understand their log data and select appropriate anomaly detection algorithms, ultimately leading to more effective monitoring and troubleshooting of IT systems.