LogoLogo
Continuum WebsiteContinuum ApplicationsContinuum KnowledgeAxolotl Platform
Continuum Knowledge
Continuum Knowledge
  • Continuum
  • Data
    • Datasets
      • Pre Training Data
      • Types of Fine Tuning
      • Self Instruct Paper
      • Self-Alignment with Instruction Backtranslation
      • Systematic Evaluation of Instruction-Tuned Large Language Models on Open Datasets
      • Instruction Tuning
      • Instruction Fine Tuning - Alpagasus
      • Less is More For Alignment
      • Enhanced Supervised Fine Tuning
      • Visualising Data using t-SNE
      • UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
      • Training and Evaluation Datasets
      • What is perplexity?
  • MODELS
    • Foundation Models
      • The leaderboard
      • Foundation Models
      • LLama 2 - Analysis
      • Analysis of Llama 3
      • Llama 3.1 series
      • Google Gemini 1.5
      • Platypus: Quick, Cheap, and Powerful Refinement of LLMs
      • Mixtral of Experts
      • Mixture-of-Agents (MoA)
      • Phi 1.5
        • Refining the Art of AI Training: A Deep Dive into Phi 1.5's Innovative Approach
      • Phi 2.0
      • Phi-3 Technical Report
  • Training
    • The Fine Tuning Process
      • Why fine tune?
        • Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
        • Explanations in Fine Tuning
      • Tokenization
        • Tokenization Is More Than Compression
        • Tokenization - SentencePiece
        • Tokenization explore
        • Tokenizer Choice For LLM Training: Negligible or Crucial?
        • Getting the most out of your tokenizer for pre-training and domain adaptation
        • TokenMonster
      • Parameter Efficient Fine Tuning
        • P-Tuning
          • The Power of Scale for Parameter-Efficient Prompt Tuning
        • Prefix-Tuning: Optimizing Continuous Prompts for Generation
        • Harnessing the Power of PEFT: A Smarter Approach to Fine-tuning Pre-trained Models
        • What is Low-Rank Adaptation (LoRA) - explained by the inventor
        • Low Rank Adaptation (Lora)
        • Practical Tips for Fine-tuning LMs Using LoRA (Low-Rank Adaptation)
        • QLORA: Efficient Finetuning of Quantized LLMs
        • Bits and Bytes
        • The Magic behind Qlora
        • Practical Guide to LoRA: Tips and Tricks for Effective Model Adaptation
        • The quantization constant
        • QLORA: Efficient Finetuning of Quantized Language Models
        • QLORA and Fine-Tuning of Quantized Language Models (LMs)
        • ReLoRA: High-Rank Training Through Low-Rank Updates
        • SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models
        • GaLora: Memory-Efficient LLM Training by Gradient Low-Rank Projection
      • Hyperparameters
        • Batch Size
        • Padding Tokens
        • Mixed precision training
        • FP8 Formats for Deep Learning
        • Floating Point Numbers
        • Batch Size and Model loss
        • Batch Normalisation
        • Rethinking Learning Rate Tuning in the Era of Language Models
        • Sample Packing
        • Gradient accumulation
        • A process for choosing the learning rate
        • Learning Rate Scheduler
        • Checkpoints
        • A Survey on Efficient Training of Transformers
        • Sequence Length Warmup
        • Understanding Training vs. Evaluation Data Splits
        • Cross-entropy loss
        • Weight Decay
        • Optimiser
        • Caching
      • Training Processes
        • Extending the context window
        • PyTorch Fully Sharded Data Parallel (FSDP)
        • Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
        • YaRN: Efficient Context Window Extension of Large Language Models
        • Sliding Window Attention
        • LongRoPE
        • Reinforcement Learning
        • An introduction to reinforcement learning
        • Reinforcement Learning from Human Feedback (RLHF)
        • Direct Preference Optimization: Your Language Model is Secretly a Reward Model
  • INFERENCE
    • Why is inference important?
      • Grouped Query Attention
      • Key Value Cache
      • Flash Attention
      • Flash Attention 2
      • StreamingLLM
      • Paged Attention and vLLM
      • TensorRT-LLM
      • Torchscript
      • NVIDIA L40S GPU
      • Triton Inference Server - Introduction
      • Triton Inference Server
      • FiDO: Fusion-in-Decoder optimised for stronger performance and faster inference
      • Is PUE a useful measure of data centre performance?
      • SLORA
  • KNOWLEDGE
    • Vector Databases
      • A Comprehensive Survey on Vector Databases
      • Vector database management systems: Fundamental concepts, use-cases, and current challenges
      • Using the Output Embedding to Improve Language Models
      • Decoding Sentence-BERT
      • ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
      • SimCSE: Simple Contrastive Learning of Sentence Embeddings
      • Questions Are All You Need to Train a Dense Passage Retriever
      • Improving Text Embeddings with Large Language Models
      • Massive Text Embedding Benchmark
      • RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking
      • LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
      • Embedding and Fine-Tuning in Neural Language Models
      • Embedding Model Construction
      • Demystifying Embedding Spaces using Large Language Models
      • Fine-Tuning Llama for Multi-Stage Text Retrieval
      • Large Language Model Based Text Augmentation Enhanced Personality Detection Model
      • One Embedder, Any Task: Instruction-Finetuned Text Embeddings
      • Vector Databases are not the only solution
      • Knowledge Graphs
        • Harnessing Knowledge Graphs to Elevate AI: A Technical Exploration
        • Unifying Large Language Models and Knowledge Graphs: A Roadmap
      • Approximate Nearest Neighbor (ANN)
      • High Dimensional Data
      • Principal Component Analysis (PCA)
      • Vector Similarity Search - HNSW
      • FAISS (Facebook AI Similarity Search)
      • Unsupervised Dense Retrievers
    • Retrieval Augmented Generation
      • Retrieval-Augmented Generation for Large Language Models: A Survey
      • Fine-Tuning or Retrieval?
      • Revolutionising Information Retrieval: The Power of RAG in Language Models
      • A Survey on Retrieval-Augmented Text Generation
      • REALM: Retrieval-Augmented Language Model Pre-Training
      • Retrieve Anything To Augment Large Language Models
      • Generate Rather Than Retrieve: Large Language Models Are Strong Context Generators
      • Active Retrieval Augmented Generation
      • DSPy: LM Assertions: Enhancing Language Model Pipelines with Computational Constraints
      • DSPy: Compiling Declarative Language Model Calls
      • DSPy: In-Context Learning for Extreme Multi-Label Classification
      • Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
      • HYDE: Revolutionising Search with Hypothetical Document Embeddings
      • Enhancing Recommender Systems with Large Language Model Reasoning Graphs
      • Retrieval Augmented Generation (RAG) versus fine tuning
      • RAFT: Adapting Language Model to Domain Specific RAG
      • Summarisation Methods and RAG
      • Lessons Learned on LLM RAG Solutions
      • Stanford: Retrieval Augmented Language Models
      • Overview of RAG Approaches with Vector Databases
      • Mastering Chunking in Retrieval-Augmented Generation (RAG) Systems
    • Semantic Routing
    • Resource Description Framework (RDF)
  • AGENTS
    • What is agency?
      • Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves
      • Types of Agents
      • The risk of AI agency
      • Understanding Personality in Large Language Models: A New Frontier in AI Psychology
      • AI Agents - Reasoning, Planning, and Tool Calling
      • Personality and Brand
      • Agent Interaction via APIs
      • Bridging Minds and Machines: The Legacy of Newell, Shaw, and Simon
      • A Survey on Language Model based Autonomous Agents
      • Large Language Models as Agents
      • AI Reasoning: A Deep Dive into Chain-of-Thought Prompting
      • Enhancing AI Reasoning with Self-Taught Reasoner (STaR)
      • Exploring the Frontier of AI: The "Tree of Thoughts" Framework
      • Toolformer: Revolutionising Language Models with API Integration - An Analysis
      • TaskMatrix.AI: Bridging Foundational AI Models with Specialised Systems for Enhanced Task Completion
      • Unleashing the Power of LLMs in API Integration: The Rise of Gorilla
      • Andrew Ng's presentation on AI agents
      • Making AI accessible with Andrej Karpathy and Stephanie Zhan
  • Regulation and Ethics
    • Regulation and Ethics
      • Privacy
      • Detecting AI Generated content
      • Navigating the IP Maze in AI: The Convergence of Blockchain, Web 3.0, and LLMs
      • Adverse Reactions to generative AI
      • Navigating the Ethical Minefield: The Challenge of Security in Large Language Models
      • Navigating the Uncharted Waters: The Risks of Autonomous AI in Military Decision-Making
  • DISRUPTION
    • Data Architecture
      • What is a data pipeline?
      • What is Reverse ETL?
      • Unstructured Data and Generatve AI
      • Resource Description Framework (RDF)
      • Integrating generative AI with the Semantic Web
    • Search
      • BM25 - Search Engine Ranking Function
      • BERT as a reranking engine
      • BERT and Google
      • Generative Engine Optimisation (GEO)
      • Billion-scale similarity search with GPUs
      • FOLLOWIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
      • Neural Collaborative Filtering
      • Federated Neural Collaborative Filtering
      • Latent Space versus Embedding Space
      • Improving Text Embeddings with Large Language Models
    • Recommendation Engines
      • On Interpretation and Measurement of Soft Attributes for Recommendation
      • A Survey on Large Language Models for Recommendation
      • Model driven recommendation systems
      • Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
      • Foundation Models for Recommender Systems
      • Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review
      • AI driven recommendations - harming autonomy?
    • Logging
      • A Taxonomy of Anomalies in Log Data
      • Deeplog
      • LogBERT: Log Anomaly Detection via BERT
      • Experience Report: Deep Learning-based System Log Analysis for Anomaly Detection
      • Log-based Anomaly Detection with Deep Learning: How Far Are We?
      • Deep Learning for Anomaly Detection in Log Data: A Survey
      • LogGPT
      • Adaptive Semantic Gate Networks (ASGNet) for log-based anomaly diagnosis
  • Infrastructure
    • The modern data centre
      • Enhancing Data Centre Efficiency: Strategies to Improve PUE
      • TCO of NVIDIA GPUs and falling barriers to entry
      • Maximising GPU Utilisation with Kubernetes and NVIDIA GPU Operator
      • Data Centres
      • Liquid Cooling
    • Servers and Chips
      • The NVIDIA H100 GPU
      • NVIDIA H100 NVL
      • Lambda Hyperplane 8-H100
      • NVIDIA DGX Servers
      • NVIDIA DGX-2
      • NVIDIA DGX H-100 System
      • NVLink Switch
      • Tensor Cores
      • NVIDIA Grace Hopper Superchip
      • NVIDIA Grace CPU Superchip
      • NVIDIA GB200 NVL72
      • Hopper versus Blackwell
      • HGX: High-Performance GPU Platforms
      • ARM Chips
      • ARM versus x86
      • RISC versus CISC
      • Introduction to RISC-V
    • Networking and Connectivity
      • Infiniband versus Ethernet
      • NVIDIA Quantum InfiniBand
      • PCIe (Peripheral Component Interconnect Express)
      • NVIDIA ConnectX InfiniBand adapters
      • NVMe (Non-Volatile Memory Express)
      • NVMe over Fabrics (NVMe-oF)
      • NVIDIA Spectrum-X
      • NVIDIA GPUDirect
      • Evaluating Modern GPU Interconnect
      • Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)
      • Next-generation networking in AI environments
      • NVIDIA Collective Communications Library (NCCL)
    • Data and Memory
      • NVIDIA BlueField Data Processing Units (DPUs)
      • Remote Direct Memory Access (RDMA)
      • High Bandwidth Memory (HBM3)
      • Flash Memory
      • Model Requirements
      • Calculating GPU memory for serving LLMs
      • Transformer training costs
      • GPU Performance Optimisation
    • Libraries and Complements
      • NVIDIA Base Command
      • NVIDIA AI Enterprise
      • CUDA - NVIDIA GTC 2024 presentation
      • RAPIDs
      • RAFT
    • Vast Data Platform
      • Vast Datastore
      • Vast Database
      • Vast Data Engine
      • DASE (Disaggregated and Shared Everything)
      • Dremio and VAST Data
    • Storage
      • WEKA: A High-Performance Storage Solution for AI Workloads
      • Introduction to NVIDIA GPUDirect Storage (GDS)
        • GDS cuFile API
      • NVIDIA Magnum IO GPUDirect Storage (GDS)
      • Vectors in Memory
Powered by GitBook
LogoLogo

Continuum - Accelerated Artificial Intelligence

  • Continuum Website
  • Axolotl Platform

Copyright Continuum Labs - 2023

On this page
  • Understanding Embedding Spaces
  • Some definitions
  • Domain Embeddings (ED)
  • Summary
  • The Experiment
  • The Data: A Rich Tapestry of Ratings and Descriptions
  • Diving into Types of Embeddings
  • Conclusion: A Leap Forward in AI Applications
  • Other Applications?

Was this helpful?

  1. KNOWLEDGE
  2. Vector Databases

Demystifying Embedding Spaces using Large Language Models

Guy Tennenholtz et al. from Google Research

PreviousEmbedding Model ConstructionNextFine-Tuning Llama for Multi-Stage Text Retrieval

Last updated 1 year ago

Was this helpful?

In this October 2023 paper from Google Research introduce the Embedding Language Model (ELM), a approach leveraging Large Language Models (LLMs) to transform embedding representations into comprehensible narratives.

This innovation addresses a critical gap in machine learning: the interpretability of dense vector embeddings.

Understanding Embedding Spaces

Embedding spaces are at the heart of various applications like natural language processing, recommender systems, and protein sequence modeling. They condense multifaceted information into dense vectors, capturing nuanced relationships and semantic structures. However, their complexity often leads to a lack of direct interpretability.

The Challenge of Interpretation

Some definitions

t-SNE (t-Distributed Stochastic Neighbor Embedding)

t-SNE is a technique for dimensionality reduction, primarily used for visualising high-dimensional data in a low-dimensional space (usually two or three dimensions).

It works by converting similarities between data points to joint probabilities and then minimising the Kullback–Leibler divergence between these joint probabilities in the original high-dimensional space and the low-dimensional embedding.

While excellent for visualisation, t-SNE may not always preserve the global structure of the data, focusing more on local relationships. Interpretations based on the clustering or distances in the t-SNE plot can sometimes be misleading.

UMAP (Uniform Manifold Approximation and Projection)

UMAP is another dimensionality reduction technique, similar in purpose to t-SNE but often faster and more scalable.

UMAP constructs a high-dimensional graph representing the data and then optimises a low-dimensional graph to be as structurally similar as possible. It relies on concepts from Riemannian geometry and algebraic topology.

Similar to t-SNE, UMAP is great for visualisation but can sometimes obscure the true nature of the data’s structure due to its focus on local rather than global features.

Concept Activation Vectors (CAVs)

CAVs are a method used to interpret what machine learning models (especially neural networks) have learned.

CAVs are vectors in the space of a neural network's internal activations. By training linear classifiers to distinguish between activations caused by different types of inputs, CAVs can be used to understand what concepts a layer of a neural network is detecting.

The interpretation provided by CAVs is limited to the concepts they are trained to detect and may not capture the full complexity of the model's internal representations. They also require a level of expertise to define and understand the relevant concepts and classifiers.

Large Language Models as a Solution

The paper presents an innovative solution: using LLMs to interact directly with embedding spaces.

By integrating embeddings into LLMs, the abstract vectors are converted into understandable narratives, thus enhancing their interpretability.

The Advent of the Embedding Language Model (ELM)

ELM represents a paradigm shift, where LLMs are trained with adapter layers to map domain embedding vectors into the token-level embedding space of an LLM. This training enables the model to interpret continuous domain embeddings using natural language.

Training and Applications of ELM

The authors developed a methodology to fine-tune pre-trained LLMs for domain-embedding interpretation, testing ELM on various tasks such as interpreting movie and user embeddings from the MovieLens 25M dataset.

The applications demonstrated range from generalising CAVs as an interpretability method to describing hypothetical embedded entities and interpreting user embeddings in recommender systems.

Bridging Data Representations and Expressive Capabilities

This research remarkably bridges the gap between the rich data representations of domain embeddings and the expressive capabilities of LLMs. It facilitates a direct “dialogue” about vectors, allowing complex embedding data to be queried and narratives extracted.

Implications and Impact

This significant breakthrough addresses the long-standing challenge of making complex embedding spaces interpretable and broadly useful. Leveraging the power of LLMs, it opens new possibilities for understanding and interacting with data represented as embeddings, with vast implications in areas where embeddings are used.

Background: Domain Embeddings and LLMs

The paper explores domain embeddings, mapping entities into a latent space, and the structure of LLMs, which map sequences of language tokens into another set of tokens.

ELM's architecture incorporates both textual inputs and domain embeddings, demonstrating a unique integration of language and domain-specific data.

Training Procedure and Challenges

The training involves a two-stage process, focusing first on the adapter and then on fine-tuning the entire model. This approach addresses the challenges posed by training continuous prompts and ensures effective convergence of M_ELM.

Empirical Analysis and Experiment Results

The authors conducted extensive tests on both real and hypothetical entities to assess ELM's capability in interpreting and extrapolating embedding vectors.

The results demonstrated ELM's effectiveness in various scenarios, such as summarising movies, writing reviews, and comparing movies, with a robust assessment of ELM's capabilities using both human evaluations and consistency metrics.

Conclusion and Future Directions

ELM represents a significant advancement in the field, enabling dynamic, language-based exploration of embedding spaces.

It opens exciting avenues for future exploration in understanding and navigating complex embedding representations.

The potential applications of ELM in areas like personalised content recommendation, interactive educational platforms, advanced market analysis tools, creative entertainment, and healthcare are vast and promising.

To explore the technical details of domain embeddings, Large Language Models (LLMs), and the Embedding Language Model (ELM), let's break down each component:

Domain Embeddings (ED)

Concept:

  • Usage: They capture latent features for applications like recommender systems, image classification, and information retrieval.

Example (Python Pseudocode)

domain_embedding(entity):
    # Convert an entity into a high-dimensional vector
    # In practice, this involves algorithms like neural networks
    return embedding_vector

Large Language Models (LLMs)

Overview:

  • LLMs: Models like GPT, BERT, and PaLM that map sequences of language tokens into another set of tokens.

  • Components:

    • Embedding Layer: Maps tokens to token embedding representations.

    • Dense Model: Maps these embeddings to a sequence of tokens.

Functioning (Python Pseudocode)

def llm_embedding_layer(token):
    # Maps a token to its embedding
    return token_embedding

def llm_dense_model(token_embeddings):
    # Processes the sequence of embeddings to generate output tokens
    return output_tokens

Embedding Language Model (ELM)

Framework:

  • Integration: ELM integrates domain embeddings with LLMs using adapter layers.

Problem Formulation:

Architecture:

  • Model Structure:

    • E0: Usual embedding layer for textual inputs.

    • EA: Adapter model mapping domain embeddings to the LLM's space.

Training Procedure:

  1. Two-Stage Training:

    • Stage 2: Fine-tune the entire model (E0, M0, EA).

  2. Challenges and Solutions:

    • Continuous Prompts: The integration of continuous domain embeddings with pre-trained token embeddings can be challenging.

    • Solutions: The two-stage approach mitigates issues like convergence to local minima.

Example (Python Pseudocode)

def train_elm(domain_embeddings, text_embeddings, tasks):
    # Stage 1: Train the adapter
    adapter_model = train_adapter(domain_embeddings, tasks)

    # Stage 2: Fine-tune the entire model
    elm_model = fine_tune_model(adapter_model, text_embeddings, tasks)
    return elm_model

# Example use-case
domain_embedding = domain_embedding("movie_title")
text_embedding = llm_embedding_layer("related text")
elm_output = train_elm(domain_embedding, text_embedding, tasks)

Summary

The ELM framework is a sophisticated blend of domain-specific embeddings and LLM architectures.

It involves a nuanced training process to ensure the effective convergence of diverse data representations into a coherent model capable of interpreting and generating language-based outputs.

This approach overcomes traditional challenges in embedding interpretability, facilitating a deeper understanding of complex, abstract data representations.

The Experiment

The authors of the paper conducting an experiment using the MovieLens 25M Dataset.

This substantial dataset features a staggering 25 million ratings of over 62,000 movies by more than 162,000 users, aimed to scrutinise and validate the efficacy of ELM.

The Data: A Rich Tapestry of Ratings and Descriptions

The MovieLens 25M Dataset is not just a collection of numbers; these ratings paint a vivid picture of viewer interactions and preferences. The dataset's primary goal was to forge two distinct types of embeddings – one reflecting the movies themselves and the other encapsulating user behavior and preferences.

Diving into Types of Embeddings

Behavioural Embeddings

These are derived from the gold mine of user ratings.

By employing sophisticated matrix factorisation techniques like Weighted Alternating Least Squares (WALS), the model deciphers patterns in how users interact with movies. Interestingly, this type doesn't explore the content of the movies but rather focuses on the user-movie interaction dynamics.

Semantic Embeddings

Here's where the content takes centre stage. Based on rich textual descriptions like plot summaries and reviews, these embeddings are crafted using a pre-trained dual-encoder language model, akin to Sentence-T5. They aim to encapsulate the essence, themes, and narratives of the movies.

Training Data and Tasks: A Dual Approach

The experiment didn't just stick to one approach. Instead, it bifurcated the ELM into two versions – one for interpreting the semantic nuances of movies and the other for decoding the behavioural patterns of users.

The tasks were diverse, including 24 different movie-focused tasks using a pre-trained model like PaLM2-L, and a unique user profile generation task that created textual summaries encapsulating a user’s cinematic tastes.

Evaluation Metrics: A Rigorous Scrutiny

Here's where things get really interesting. The evaluation wasn't left to cold, impersonal algorithms alone. Instead, human evaluators – 100 of them – were roped in to rate the outputs on relevance, linguistic quality, and overall suitability.

This human touch was supplemented by consistency metrics like Semantic Consistency (SC) and Behavioural Consistency (BC), ensuring a well-rounded assessment.

Experiment Results: ELM's Moment of Truth

The moment of truth for ELM came as it was put through its paces – summarising movies, writing reviews, and even comparing different movies. The results were telling. ELM demonstrated its prowess, not just in technical metrics like Semantic Consistency (SC) and Behavioural Consistency (BC) but also in winning the approval of human evaluators.

Conclusion: A Leap Forward in AI Applications

This experiment wasn't just another run-of-the-mill AI test. It was a comprehensive, thoughtfully designed exploration that underscored ELM's ability to not just crunch numbers but to weave them into coherent, relatable narratives.

It's a testament to how models are evolving, becoming more interpretable, and, importantly, more aligned with real-world applications and human understanding.

The Embedding Language Model, through this experiment, has shown that it's not just about understanding data – it's about narrating the stories hidden within it.

Other Applications?

Personalised Content Recommendation Systems:

  • ELM could be used in content recommendation engines, such as those used by streaming services or e-commerce platforms. By interpreting user profile embeddings, ELM can generate detailed preference profiles that capture nuanced tastes and interests. This could lead to highly personalised recommendations, enhancing user engagement and satisfaction.

Interactive Educational Platforms

  • ELM can be used to create dynamic, adaptive learning modules. By interpreting student performance and engagement embeddings, ELM can suggest personalised learning paths, resources, and activities. This can help tailor education to individual learning styles and needs, making education more effective and engaging.

Advanced Market Analysis Tools

  • Companies can use ELM to interpret complex market and consumer data embeddings. By extrapolating from existing consumer behavior patterns, ELM can predict emerging market trends and consumer preferences. This application could be useful for businesses in constructing marketing campaigns, product development, and targeting new market segments.

Creative Entertainment and Storytelling

  • ELM offers possibilities in the field of entertainment, particularly in interactive storytelling and gaming. By interpolating between various narrative embeddings, ELM can generate unique, coherent storylines and character arcs. This can be used in video games, interactive novels, or online platforms to create dynamic, user-driven narratives.

Healthcare and Patient Data Interpretation

  • In healthcare, ELM could interpret patient data embeddings to provide comprehensive health profiles. By analysing complex health data, ELM can assist medical professionals in understanding patient conditions, predicting health risks, and suggesting personalised treatment plans. This application could significantly enhance the precision and effectiveness of healthcare services.

Traditional interpretability methods like , , or Concept Activation Vectors (CAVs) offer limited understanding of these abstract representations. The challenge lies in decoding the intricate information embedded within these vectors into something more tangible and understandable.

Domain Embeddings, EA:W→Z:EA:W→Z:EA:W→Z:These functions map entities (like users or items) in a vocabulary VVV to a latent space WWW within RnRnRn.

Mathematical Representation: If ∈v∈V∈v∈V∈v∈V is an entity (e.g., a movie title), the domain embedding ED(v)ED(v)ED(v) is a vector in Rn RnRn representing this entity in the latent space.

Adapter Layer EA:W→ZEA:W→ZEA:W→Z: Transforms domain embeddings to a format compatible with LLMs.

Assumption: Access to a domain embedding space (W,d)(W,d)(W,d) and pairs (v,ED(v))(v,ED(v))(v,ED(v)) for training.

Tasks (T): Designed to capture semantic information about entities in WWW.

ELM Model (M_ELM): Combines a pre-trained LLM with an adapter model EAEAEA.

Stage 1: Train adapter EAEAEA while keeping other parameters frozen.

t-SNE
UMAP
LogoDemystifying Embedding Spaces using Large Language ModelsarXiv.org
Demystifying Embedding Spaces using Large Language Models
Page cover image