Deeplog

This highly cited 2017 paper presents DeepLog, a deep learning-based framework for real-time anomaly detection and diagnosis in system logs.

DeepLogdl.acm.org

The main contributions and key aspects of DeepLog:

Log Key Anomaly Detection

DeepLog uses Long Short-Term Memory (LSTM) networks to model log key sequences.
It trains an LSTM model on normal log key sequences to learn normal system execution patterns.
During detection, if the next log key predicted by the model does not match the actual log key, an anomaly is detected.
DeepLog introduces a parameter g to allow for a set of top-g predictions to be considered normal, improving detection accuracy.

Parameter Value Anomaly Detection

DeepLog employs a separate LSTM model for each log key to detect anomalies in parameter value vectors.
The model is trained to predict the next parameter value vector based on historical vectors.
An anomaly is detected if the difference between the predicted and actual vector exceeds a threshold derived from the training data.

Workflow Construction

DeepLog proposes methods to separate different tasks from interleaved log entries and build workflow models for each task.
Two approaches are presented: (1) using the LSTM-based log key anomaly detection model's predictions, and (2) using a density-based clustering approach based on log key co-occurrence patterns.
The constructed workflows aid in anomaly diagnosis by providing insights into the system's execution path.

Online Update and Training

DeepLog supports incremental updates to its LSTM models based on user feedback, allowing it to adapt to new normal execution patterns.
When a false positive is reported, DeepLog updates the model's weights using the incorrectly detected log entry, improving its accuracy over time.

Evaluation

DeepLog is evaluated on large-scale system logs from HDFS and OpenStack, demonstrating superior performance compared to state-of-the-art methods like PCA, Invariant Mining, and LogCluster.
The parameter value anomaly detection is tested on OpenStack logs with injected performance anomalies, showcasing DeepLog's ability to detect subtle anomalies.
The online update and training mechanism is evaluated on the Blue Gene/L supercomputer log, significantly reducing false positives and adapting to new patterns.
Case studies on network security logs (VAST Challenge 2011) and BROP attack detection further validate DeepLog's effectiveness in real-world scenarios.

Workflow Construction Evaluation

Both LSTM-based and density-based clustering approaches successfully separate different tasks from OpenStack logs.
The constructed workflow for the VM creation task is used to diagnose performance anomalies, demonstrating its utility in anomaly diagnosis.

In summary, DeepLog presents a comprehensive and effective framework for online log anomaly detection and diagnosis using deep learning techniques.

By modeling log key sequences and parameter value vectors with LSTM networks, DeepLog can detect subtle anomalies at a fine-grained level.

The workflow construction and online update mechanisms further enhance its practicality and adaptability in real-world systems.

The extensive evaluation on diverse datasets and case studies demonstrate DeepLog's superior performance and broad applicability compared to existing log-based anomaly detection methods.

Summary of Transcript: University of Utah

The presentation discusses the challenges in analysing system logs and proposes DeepLog as a solution for automatic log anomaly detection and diagnosis. Here's a detailed summary and analysis:

Introduction and Background

System event logs are valuable for understanding system behavior but difficult to analyze manually. The paper addresses automatic system log anomaly detection and diagnosis.

Traditional Approach and Limitations

Traditional methods parse unstructured logs into structured data (log keys) and analyze the log key sequence. Limitations: Only consider log keys, ignore parameter values, and are not suitable for complex anomalies.

DeepLog Framework

Uses SPELL (Streaming Parser for Event Logs using Longest Common Subsequence) for log parsing.
Builds two models: log key anomaly detection model and parameter value anomaly detection model. Constructs a workflow model for diagnosis.

Model Architecture and Training

Log key anomaly detection model: LSTM-based, predicts the next log key given a sequence of log keys.
Parameter value anomaly detection model: LSTM-based, predicts the next parameter value vector for each log key.
Workflow model: Separates tasks and builds a model for each task using LSTM prediction probabilities or density-based clustering.
Training is done using normal execution logs only.

Anomaly Detection and Diagnosis

Log key anomaly: Detected if the actual log key is not within the top g predictions.
Parameter value anomaly: Detected if the mean squared error between predicted and actual values exceeds a threshold.
Diagnosis: Workflow model helps pinpoint the location and cause of anomalies.

Handling False Positives

User feedback is used to update the models incrementally, reducing false positives.

Evaluation

Log key anomaly detection: Outperforms PCA, invariant mining, and n-gram language models on Hadoop file system logs.
Parameter value anomaly detection: Successfully detects injected performance anomalies in OpenStack cloud logs.
LSTM model online update: Significantly improves F-measure by reducing false positives on HPC logs.
Case study on network security logs: Detects most anomalies automatically.
Workflow construction: Helps diagnose anomalies by pinpointing the location and cause.

Conclusion and Future Work

DeepLog is a real-time log anomaly detection framework using LSTM to model system execution paths and parameter values.
Workflow models help diagnose detected anomalies, and online model updates are supported.
Future work: Analyzing correlations across different system logs.

In summary, DeepLog is a novel approach for real-time system log anomaly detection and diagnosis using deep learning techniques (LSTM). It outperforms traditional methods, provides a workflow model for diagnosis, and supports online model updates. The evaluation demonstrates its effectiveness on various datasets and case studies, while the Q&A session addresses some important questions and potential limitations.

PreviousA Taxonomy of Anomalies in Log Data NextLogBERT: Log Anomaly Detection via BERT

Last updated 1 year ago

Was this helpful?