LogoLogo
Continuum WebsiteContinuum ApplicationsContinuum KnowledgeAxolotl Platform
Continuum Knowledge
Continuum Knowledge
  • Continuum
  • Data
    • Datasets
      • Pre Training Data
      • Types of Fine Tuning
      • Self Instruct Paper
      • Self-Alignment with Instruction Backtranslation
      • Systematic Evaluation of Instruction-Tuned Large Language Models on Open Datasets
      • Instruction Tuning
      • Instruction Fine Tuning - Alpagasus
      • Less is More For Alignment
      • Enhanced Supervised Fine Tuning
      • Visualising Data using t-SNE
      • UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
      • Training and Evaluation Datasets
      • What is perplexity?
  • MODELS
    • Foundation Models
      • The leaderboard
      • Foundation Models
      • LLama 2 - Analysis
      • Analysis of Llama 3
      • Llama 3.1 series
      • Google Gemini 1.5
      • Platypus: Quick, Cheap, and Powerful Refinement of LLMs
      • Mixtral of Experts
      • Mixture-of-Agents (MoA)
      • Phi 1.5
        • Refining the Art of AI Training: A Deep Dive into Phi 1.5's Innovative Approach
      • Phi 2.0
      • Phi-3 Technical Report
  • Training
    • The Fine Tuning Process
      • Why fine tune?
        • Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
        • Explanations in Fine Tuning
      • Tokenization
        • Tokenization Is More Than Compression
        • Tokenization - SentencePiece
        • Tokenization explore
        • Tokenizer Choice For LLM Training: Negligible or Crucial?
        • Getting the most out of your tokenizer for pre-training and domain adaptation
        • TokenMonster
      • Parameter Efficient Fine Tuning
        • P-Tuning
          • The Power of Scale for Parameter-Efficient Prompt Tuning
        • Prefix-Tuning: Optimizing Continuous Prompts for Generation
        • Harnessing the Power of PEFT: A Smarter Approach to Fine-tuning Pre-trained Models
        • What is Low-Rank Adaptation (LoRA) - explained by the inventor
        • Low Rank Adaptation (Lora)
        • Practical Tips for Fine-tuning LMs Using LoRA (Low-Rank Adaptation)
        • QLORA: Efficient Finetuning of Quantized LLMs
        • Bits and Bytes
        • The Magic behind Qlora
        • Practical Guide to LoRA: Tips and Tricks for Effective Model Adaptation
        • The quantization constant
        • QLORA: Efficient Finetuning of Quantized Language Models
        • QLORA and Fine-Tuning of Quantized Language Models (LMs)
        • ReLoRA: High-Rank Training Through Low-Rank Updates
        • SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models
        • GaLora: Memory-Efficient LLM Training by Gradient Low-Rank Projection
      • Hyperparameters
        • Batch Size
        • Padding Tokens
        • Mixed precision training
        • FP8 Formats for Deep Learning
        • Floating Point Numbers
        • Batch Size and Model loss
        • Batch Normalisation
        • Rethinking Learning Rate Tuning in the Era of Language Models
        • Sample Packing
        • Gradient accumulation
        • A process for choosing the learning rate
        • Learning Rate Scheduler
        • Checkpoints
        • A Survey on Efficient Training of Transformers
        • Sequence Length Warmup
        • Understanding Training vs. Evaluation Data Splits
        • Cross-entropy loss
        • Weight Decay
        • Optimiser
        • Caching
      • Training Processes
        • Extending the context window
        • PyTorch Fully Sharded Data Parallel (FSDP)
        • Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
        • YaRN: Efficient Context Window Extension of Large Language Models
        • Sliding Window Attention
        • LongRoPE
        • Reinforcement Learning
        • An introduction to reinforcement learning
        • Reinforcement Learning from Human Feedback (RLHF)
        • Direct Preference Optimization: Your Language Model is Secretly a Reward Model
  • INFERENCE
    • Why is inference important?
      • Grouped Query Attention
      • Key Value Cache
      • Flash Attention
      • Flash Attention 2
      • StreamingLLM
      • Paged Attention and vLLM
      • TensorRT-LLM
      • Torchscript
      • NVIDIA L40S GPU
      • Triton Inference Server - Introduction
      • Triton Inference Server
      • FiDO: Fusion-in-Decoder optimised for stronger performance and faster inference
      • Is PUE a useful measure of data centre performance?
      • SLORA
  • KNOWLEDGE
    • Vector Databases
      • A Comprehensive Survey on Vector Databases
      • Vector database management systems: Fundamental concepts, use-cases, and current challenges
      • Using the Output Embedding to Improve Language Models
      • Decoding Sentence-BERT
      • ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
      • SimCSE: Simple Contrastive Learning of Sentence Embeddings
      • Questions Are All You Need to Train a Dense Passage Retriever
      • Improving Text Embeddings with Large Language Models
      • Massive Text Embedding Benchmark
      • RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking
      • LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
      • Embedding and Fine-Tuning in Neural Language Models
      • Embedding Model Construction
      • Demystifying Embedding Spaces using Large Language Models
      • Fine-Tuning Llama for Multi-Stage Text Retrieval
      • Large Language Model Based Text Augmentation Enhanced Personality Detection Model
      • One Embedder, Any Task: Instruction-Finetuned Text Embeddings
      • Vector Databases are not the only solution
      • Knowledge Graphs
        • Harnessing Knowledge Graphs to Elevate AI: A Technical Exploration
        • Unifying Large Language Models and Knowledge Graphs: A Roadmap
      • Approximate Nearest Neighbor (ANN)
      • High Dimensional Data
      • Principal Component Analysis (PCA)
      • Vector Similarity Search - HNSW
      • FAISS (Facebook AI Similarity Search)
      • Unsupervised Dense Retrievers
    • Retrieval Augmented Generation
      • Retrieval-Augmented Generation for Large Language Models: A Survey
      • Fine-Tuning or Retrieval?
      • Revolutionising Information Retrieval: The Power of RAG in Language Models
      • A Survey on Retrieval-Augmented Text Generation
      • REALM: Retrieval-Augmented Language Model Pre-Training
      • Retrieve Anything To Augment Large Language Models
      • Generate Rather Than Retrieve: Large Language Models Are Strong Context Generators
      • Active Retrieval Augmented Generation
      • DSPy: LM Assertions: Enhancing Language Model Pipelines with Computational Constraints
      • DSPy: Compiling Declarative Language Model Calls
      • DSPy: In-Context Learning for Extreme Multi-Label Classification
      • Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
      • HYDE: Revolutionising Search with Hypothetical Document Embeddings
      • Enhancing Recommender Systems with Large Language Model Reasoning Graphs
      • Retrieval Augmented Generation (RAG) versus fine tuning
      • RAFT: Adapting Language Model to Domain Specific RAG
      • Summarisation Methods and RAG
      • Lessons Learned on LLM RAG Solutions
      • Stanford: Retrieval Augmented Language Models
      • Overview of RAG Approaches with Vector Databases
      • Mastering Chunking in Retrieval-Augmented Generation (RAG) Systems
    • Semantic Routing
    • Resource Description Framework (RDF)
  • AGENTS
    • What is agency?
      • Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves
      • Types of Agents
      • The risk of AI agency
      • Understanding Personality in Large Language Models: A New Frontier in AI Psychology
      • AI Agents - Reasoning, Planning, and Tool Calling
      • Personality and Brand
      • Agent Interaction via APIs
      • Bridging Minds and Machines: The Legacy of Newell, Shaw, and Simon
      • A Survey on Language Model based Autonomous Agents
      • Large Language Models as Agents
      • AI Reasoning: A Deep Dive into Chain-of-Thought Prompting
      • Enhancing AI Reasoning with Self-Taught Reasoner (STaR)
      • Exploring the Frontier of AI: The "Tree of Thoughts" Framework
      • Toolformer: Revolutionising Language Models with API Integration - An Analysis
      • TaskMatrix.AI: Bridging Foundational AI Models with Specialised Systems for Enhanced Task Completion
      • Unleashing the Power of LLMs in API Integration: The Rise of Gorilla
      • Andrew Ng's presentation on AI agents
      • Making AI accessible with Andrej Karpathy and Stephanie Zhan
  • Regulation and Ethics
    • Regulation and Ethics
      • Privacy
      • Detecting AI Generated content
      • Navigating the IP Maze in AI: The Convergence of Blockchain, Web 3.0, and LLMs
      • Adverse Reactions to generative AI
      • Navigating the Ethical Minefield: The Challenge of Security in Large Language Models
      • Navigating the Uncharted Waters: The Risks of Autonomous AI in Military Decision-Making
  • DISRUPTION
    • Data Architecture
      • What is a data pipeline?
      • What is Reverse ETL?
      • Unstructured Data and Generatve AI
      • Resource Description Framework (RDF)
      • Integrating generative AI with the Semantic Web
    • Search
      • BM25 - Search Engine Ranking Function
      • BERT as a reranking engine
      • BERT and Google
      • Generative Engine Optimisation (GEO)
      • Billion-scale similarity search with GPUs
      • FOLLOWIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
      • Neural Collaborative Filtering
      • Federated Neural Collaborative Filtering
      • Latent Space versus Embedding Space
      • Improving Text Embeddings with Large Language Models
    • Recommendation Engines
      • On Interpretation and Measurement of Soft Attributes for Recommendation
      • A Survey on Large Language Models for Recommendation
      • Model driven recommendation systems
      • Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
      • Foundation Models for Recommender Systems
      • Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review
      • AI driven recommendations - harming autonomy?
    • Logging
      • A Taxonomy of Anomalies in Log Data
      • Deeplog
      • LogBERT: Log Anomaly Detection via BERT
      • Experience Report: Deep Learning-based System Log Analysis for Anomaly Detection
      • Log-based Anomaly Detection with Deep Learning: How Far Are We?
      • Deep Learning for Anomaly Detection in Log Data: A Survey
      • LogGPT
      • Adaptive Semantic Gate Networks (ASGNet) for log-based anomaly diagnosis
  • Infrastructure
    • The modern data centre
      • Enhancing Data Centre Efficiency: Strategies to Improve PUE
      • TCO of NVIDIA GPUs and falling barriers to entry
      • Maximising GPU Utilisation with Kubernetes and NVIDIA GPU Operator
      • Data Centres
      • Liquid Cooling
    • Servers and Chips
      • The NVIDIA H100 GPU
      • NVIDIA H100 NVL
      • Lambda Hyperplane 8-H100
      • NVIDIA DGX Servers
      • NVIDIA DGX-2
      • NVIDIA DGX H-100 System
      • NVLink Switch
      • Tensor Cores
      • NVIDIA Grace Hopper Superchip
      • NVIDIA Grace CPU Superchip
      • NVIDIA GB200 NVL72
      • Hopper versus Blackwell
      • HGX: High-Performance GPU Platforms
      • ARM Chips
      • ARM versus x86
      • RISC versus CISC
      • Introduction to RISC-V
    • Networking and Connectivity
      • Infiniband versus Ethernet
      • NVIDIA Quantum InfiniBand
      • PCIe (Peripheral Component Interconnect Express)
      • NVIDIA ConnectX InfiniBand adapters
      • NVMe (Non-Volatile Memory Express)
      • NVMe over Fabrics (NVMe-oF)
      • NVIDIA Spectrum-X
      • NVIDIA GPUDirect
      • Evaluating Modern GPU Interconnect
      • Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)
      • Next-generation networking in AI environments
      • NVIDIA Collective Communications Library (NCCL)
    • Data and Memory
      • NVIDIA BlueField Data Processing Units (DPUs)
      • Remote Direct Memory Access (RDMA)
      • High Bandwidth Memory (HBM3)
      • Flash Memory
      • Model Requirements
      • Calculating GPU memory for serving LLMs
      • Transformer training costs
      • GPU Performance Optimisation
    • Libraries and Complements
      • NVIDIA Base Command
      • NVIDIA AI Enterprise
      • CUDA - NVIDIA GTC 2024 presentation
      • RAPIDs
      • RAFT
    • Vast Data Platform
      • Vast Datastore
      • Vast Database
      • Vast Data Engine
      • DASE (Disaggregated and Shared Everything)
      • Dremio and VAST Data
    • Storage
      • WEKA: A High-Performance Storage Solution for AI Workloads
      • Introduction to NVIDIA GPUDirect Storage (GDS)
        • GDS cuFile API
      • NVIDIA Magnum IO GPUDirect Storage (GDS)
      • Vectors in Memory
Powered by GitBook
LogoLogo

Continuum - Accelerated Artificial Intelligence

  • Continuum Website
  • Axolotl Platform

Copyright Continuum Labs - 2023

On this page
  • Gap between academic research and application
  • What is the log anomaly detection process
  • Machine Learning Methods
  • Shortcomings of traditional ML methods
  • Existing log anomaly detection methods
  • Unsupervised Methods
  • Supervised Methods
  • Experiment
  • Challenges
  • Main Interests
  • Industry Application
  • Future Improvements
  • References

Was this helpful?

  1. DISRUPTION
  2. Logging

Experience Report: Deep Learning-based System Log Analysis for Anomaly Detection

PreviousLogBERT: Log Anomaly Detection via BERTNextLog-based Anomaly Detection with Deep Learning: How Far Are We?

Last updated 10 months ago

Was this helpful?

This January 2022 paper focuses on the application of deep learning techniques for log-based anomaly detection in large-scale software systems.

The authors recognise the importance of logs in ensuring system reliability and service quality, as they faithfully record runtime information that can be used for monitoring, troubleshooting, and understanding system behaviour.

The paper highlights the challenges faced by traditional manual inspection methods and machine learning-based approaches for log anomaly detection in modern software systems.

These challenges include:

  1. Insufficient interpretability of results, making it difficult for admins and analysts to trust and act on automated analysis.

  2. Weak adaptability to unseen log events that emerge due to feature additions and system upgrades.

  3. The need for handcrafted features, which can be time-consuming and demand human domain knowledge.

To address these limitations, the authors explore the application of deep learning techniques, specifically neural networks, for log-based anomaly detection.

Deep learning has shown exceptional ability in modeling complex relationships and can automatically extract features from input data.

The paper provides a comprehensive review and evaluation of five popular neural network architectures used by six state-of-the-art log anomaly detection methods:

  1. Four unsupervised methods:

    • Two methods using Long Short-Term Memory (LSTM) networks

    • One method using Transformer architecture

    • One method using Autoencoder

  2. Two supervised methods:

    • One method using Convolutional Neural Networks (CNN)

    • One method using Attentional Bidirectional LSTM (BiLSTM)

The authors note that unsupervised methods are more favored in the literature, as labels are often unavailable in real-world scenarios.

Gap between academic research and application

The authors also highlight the gap between academic research and industrial practices in adopting deep learning techniques for log-based anomaly detection.

They attribute this gap to the lack of awareness among site reliability engineers about state-of-the-art methods and the absence of open-source toolkits that apply deep learning techniques for this purpose.

To facilitate the adoption of deep learning-based log anomaly detection, the authors release an open-source toolkit containing the studied models.

This toolkit aims to help researchers and practitioners quickly understand the characteristics of popular deep learning-based anomaly detectors, save efforts on re-implementations, and focus on further customization or improvement.

What is the log anomaly detection process

Log Collection

  • Software systems generate logs that record runtime status, including timestamps and detailed messages (e.g., error symptoms, target components, IP addresses).

  • In large-scale systems, such as distributed systems, logs are often collected centrally.

  • The large volume of collected logs can be overwhelming for existing troubleshooting systems, and the lack of labeled data poses challenges for log analysis.

Log Parsing

  • Raw logs are semi-structured and need to be parsed into a structured format for analysis, a process called log parsing.

  • Log parsing identifies the constant/static part (log event, log template, or log key) and the variable/dynamic part (parameter values) of a raw log line.

  • Example: "Received block blk_789 of size 67108864 from /10.251.42.84" is parsed into the log event "Received block <> of size <> from <>", where parameters are replaced with "<>".

Log Partition and Feature Extraction

  • Logs are textual messages and need to be converted into numerical features for machine learning algorithms.

  • Each log message is represented by its log template identified by the log parser.

  • Log timestamps and identifiers (e.g., task/job/session ID) are used to partition logs into different groups, each representing a log sequence.

  • Timestamp-based log partition strategies:

    • Fixed partitioning: Uses a pre-defined time interval (partition size) to split chronologically sorted logs without overlap between consecutive partitions.

    • Sliding partitioning: Uses partition size and stride (forwarding distance) to generate overlapping log partitions, producing more log sequences than fixed partitioning.

  • Identifier-based partitioning: Sorts logs chronologically and divides them into sequences based on a unique and common identifier, indicating they originate from the same task execution.

  • Traditional ML-based methods often generate a vector of log event counts as input features, where each dimension represents a log event, and the value counts its occurrence in a log sequence.

  • DL-based methods directly consume the log event sequence, representing each element as an index or a more sophisticated feature like a log embedding vector to learn the semantics of logs.

Anomaly Detection

  • Based on the log features constructed in the previous phase, anomaly detection identifies anomalous log instances (e.g., logs printed by interruption exceptions).

  • Traditional ML-based anomaly detectors often produce a prediction (anomaly or not) for the entire log sequence based on its log event count vector.

  • DL-based methods first learn normal log patterns and then determine the normality for each log event, enabling them to locate the exact log event(s) that contaminate the log event sequence, improving interpretability.

Overall, log anomaly detection is a multi-step process that involves collecting logs, parsing them into a structured format, extracting features through log partitioning and representation, and finally applying anomaly detection algorithms to identify anomalous log instances.

The choice of methods, such as traditional ML-based or DL-based approaches, depends on the specific requirements and characteristics of the system being analyzed. DL-based methods have the advantage of learning log semantics and providing more interpretable results by locating specific anomalous log events within a sequence.

Machine Learning Methods

Based on the log anomaly detection process described, the current machine learning techniques used for log analysis can be categorised into two main groups:

Traditional Machine Learning (ML) methods

  • Examples: Log Clustering, Principal Component Analysis (PCA), Invariant Mining, Logistic Regression, Decision Trees, Support Vector Machines (SVM)

  • These methods often rely on handcrafted features, such as log event count vectors, where each dimension represents a log event and the value counts its occurrence in a log sequence.

  • They typically produce a prediction (anomaly or not) for the entire log sequence based on the extracted features.

Shortcomings of traditional ML methods

Handcrafted features: Traditional ML methods often require domain knowledge to manually design and extract relevant features from log data, which can be time-consuming and may not capture all the important information.

Limited ability to handle complex patterns: These methods may struggle to capture complex, non-linear relationships and long-term dependencies in log sequences, which can be crucial for accurate anomaly detection.

Lack of interpretability: Traditional ML methods typically produce predictions for entire log sequences, making it difficult to pinpoint the specific log events responsible for the anomalies.

Sensitivity to unseen log events: These methods often rely on a fixed set of log events and may not generalize well to unseen or evolving log patterns, requiring retraining when new log events emerge.

Existing log anomaly detection methods

Existing log anomaly detection methods can be categorised into unsupervised and supervised approaches.

The main idea behind unsupervised methods is that logs produced by a system's normal executions often exhibit stable patterns, and anomalies occur when these patterns are violated.

Supervised methods, on the other hand, require anomaly labels and learn features that distinguish abnormal samples from normal ones.

The paper introduces six state-of-the-art methods, four unsupervised and two supervised, which leverage neural networks for log anomaly detection. The choice of network architecture and loss function is crucial, as the loss guides how the model learns log patterns.

Unsupervised Methods

DeepLog

  • First work to employ LSTM for log anomaly detection

  • Learns log patterns from sequential relations of log events

  • Uses forecasting-based anomaly detection, predicting the next log event based on previous observations

LogAnomaly

  • Considers semantic information of logs using template2Vec

  • Generates distributed representations of words in log templates by considering synonyms and antonyms

  • Adopts forecasting-based anomaly detection with an LSTM model

Logsy (Transformer-based method)

  • First work to use the Transformer for log anomaly detection

  • Learns log representations to distinguish between normal and abnormal samples

  • Employs multi-head self-attention mechanism

  • Follows forecasting-based anomaly detection

Autoencoder

  • Uses autoencoder combined with isolation forest

  • Autoencoder learns representations for normal log event sequences

  • Anomalies detected based on reconstruction loss

Supervised Methods

LogRobust

  • Addresses log instability issue (unseen log events) by extracting semantic information using word vectors

  • Incorporates attention mechanism into a Bi-LSTM model to assign different weights to log events

  • Generates classification results (anomaly or not) using a softmax layer

CNN

  • First work to explore the feasibility of CNN for log-based anomaly detection

  • Constructs log event sequences using identifier-based partitioning

  • Proposes logkey2vec embedding method to create a trainable matrix for convolution calculation

  • Applies different convolutional layers and concatenates their outputs for prediction

The unsupervised methods, particularly those incorporating semantic information (e.g., LogAnomaly and Logsy), demonstrate the importance of understanding the meaning behind log events for accurate anomaly detection.

The supervised methods, LogRobust and CNN, introduce techniques such as attention mechanisms and convolutional layers to improve the model's ability to distinguish anomalies from normal events.

Experiment

The authors designed their experiment to evaluate the accuracy, robustness, and efficiency of six state-of-the-art deep learning-based log anomaly detection methods on two widely-used datasets, HDFS and BGL.

They aimed to provide a comprehensive comparison of these methods and address the lack of publicly available tools for industrial usage.

Experiment Design

  1. Dataset selection: The authors chose HDFS and BGL datasets from Loghub, a large collection of system log datasets. These datasets were selected due to their popularity and different characteristics (e.g., HDFS logs contain identifiers, while BGL logs do not).

  2. Evaluation metrics: The authors employed precision, recall, and F1 score to measure the accuracy of the anomaly detection methods, as log anomaly detection is a binary classification problem.

  3. Experiment setup: The experiments were conducted on a machine with specific hardware configurations. The authors sorted logs chronologically, applied log partitioning to generate log sequences, and shuffled them. They used 80% of the data for training and 20% for testing. For unsupervised methods, anomalies were removed from the training data to learn normal log patterns.

Challenges

Unseen logs: The presence of unprecedented logs in the testing data posed a significant challenge to the anomaly detection methods, especially unsupervised ones.

Anomalies in training data: Even a small portion of anomalies in the training data could quickly deteriorate the performance of forecasting-based methods.

Efficiency: Deep learning-based methods generally required more time for training and testing compared to traditional machine learning-based methods.

Main Interests

The authors were most interested in:

  1. Evaluating the accuracy of deep learning-based log anomaly detection methods and comparing them with traditional machine learning-based methods.

  2. Investigating the impact of log semantics on the accuracy and robustness of the anomaly detection methods.

  3. Assessing the robustness of the methods against unseen logs and anomalies in the training data.

  4. Comparing the efficiency of deep learning-based methods with traditional machine learning-based methods in terms of training and testing time.

By designing the experiment in this manner, the authors aimed to provide valuable insights into the strengths and weaknesses of different deep learning-based log anomaly detection methods, as well as their performance in comparison to traditional machine learning-based methods.

The findings can guide researchers and practitioners in selecting appropriate methods for their specific use cases and help bridge the gap between academic research and industrial application.

Industry Application

The authors present a case study of deploying an automated log-based anomaly detection system in production at Huawei Cloud.

They selected an optimised version of DeepLog, a highly-cited deep learning-based method, for its simplicity and superior performance.

The deployment was motivated by the impracticality of manual anomaly detection in the face of terabytes of daily log data generated by services serving hundreds of millions of users.

Deployment Architecture

The log anomaly detection pipeline consists of two stages: offline training and online serving.

Online stage

  • Kafka is used as a streaming channel for online log analytics.

  • Data producers are different services that generate raw log data at runtime, with each service corresponding to one Kafka topic for data streaming.

  • The anomaly detection model acts as the data consumer and performs anomaly detection for each service.

  • Apache Flink is used for distributed log preprocessing and anomaly detection, processing streaming data with high performance and low latency.

  • Detection results are visualized on a monitoring panel through Prometheus.

  • Engineers confirm true anomalies or flag false positives with simple clicks.

Offline stage

  • Raw logs are archived and maintained in Apache HDFS.

  • Logs are retrieved from HDFS for model (re)training and evaluation.

  • A threshold is set manually for alerting anomalies.

  • Engineers can trigger model retraining if they observe performance degradation on the monitoring panel.

Real-World Challenges

Despite the model's success in shedding light on automated log-based anomaly detection and reporting high-risk anomalies, the authors identified several challenges:

  1. High complexity of production logs compared to benchmark datasets.

  2. The need for periodic threshold re-determination.

  3. Concept drift due to feature upgrades and evolving log patterns.

  4. Large-volume and low-quality log data due to lack of rigorous logging guidelines.

  5. Unsatisfactory interpretability, requiring further improvement in learning logs' semantics.

  6. Incorrect model strategy, as most anomalies stem from specific error logs rather than incorrect log event orders.

  7. Labelling issues due to ambiguous cases and privacy concerns.

Future Improvements

Closer engineering collaboration

  • Establish a clear objective at the executive level and align infrastructure development, service architecture design, and engineers' mindsets.

  • Build a pipeline for log data generation, collection, labelling, and usage, with data/label sanity checks and continuous model-quality validation.

Better logging practices

  • Establish guidelines for writing logging statements, including timestamps, verbosity levels, context information, meaningful messages, template-based logging, and proper logging statement count.

Model improvement

  • Explore online learning, human-in-the-loop design, and multi-source learning (combining logs with metrics and incident tickets).

  • Address multiple aspects of logs (keywords, log event count/sequence) using ensemble learning.

  • Explore semantic relations between log events for accurate anomaly detection and automated fault localisation.

In summary, the authors' industrial case study highlights the potential and challenges of deploying deep learning-based log anomaly detection in production environments.

References

  1. "DeepLog: Anomaly detection and diagnosis from system logs through deep learning", Anomaly Detection in System Logs, Min Du, Feifei Li, Guineng Zheng, Vivek Srikumar, 2017.

  2. "LogBERT: Log Anomaly Detection via BERT", Application of BERT for Log Anomaly Detection, Haixuan Guo, Shuhan Yuan, Xintao Wu, 2021.

  3. "Self-attentive classification-based anomaly detection in unstructured logs", Enhancing Anomaly Detection in Logs using Self-attention, Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker, Jorge Cardoso, Odej Kao, 2020.

  4. "Log-based Anomaly Detection with Deep Learning: How Far Are We?", Evaluation of Deep Learning Techniques for Log-based Anomaly Detection, Van Hoang Le, Hongyu Zhang, 2022.

  5. "Robust and transferable anomaly detection in log data using pre-trained language models", Using Pre-trained Language Models for Anomaly Detection in Log Data, Harold Ott, Jasmin Bogatinovski, Alexander Acker, Sasho Nedelkoski, Odej Kao, 2021.

  6. "Log-based anomaly detection without log parsing", Innovating Anomaly Detection without Traditional Log Parsing, Van-Hoang Le, Hongyu Zhang, 2021.

  7. "A2Log: Attentive Augmented Log Anomaly Detection", Applying Attention Mechanisms for Log Anomaly Detection, Thorsten Wittkopp, Alexander Acker, Sasho Nedelkoski, et al., 2021.

LogoExperience Report: Deep Learning-based System Log Analysis for...arXiv.org
Experience Report: Deep Learning-based System Log Analysis for Anomaly Detection