LogoLogo
Continuum WebsiteContinuum ApplicationsContinuum KnowledgeAxolotl Platform
Continuum Knowledge
Continuum Knowledge
  • Continuum
  • Data
    • Datasets
      • Pre Training Data
      • Types of Fine Tuning
      • Self Instruct Paper
      • Self-Alignment with Instruction Backtranslation
      • Systematic Evaluation of Instruction-Tuned Large Language Models on Open Datasets
      • Instruction Tuning
      • Instruction Fine Tuning - Alpagasus
      • Less is More For Alignment
      • Enhanced Supervised Fine Tuning
      • Visualising Data using t-SNE
      • UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
      • Training and Evaluation Datasets
      • What is perplexity?
  • MODELS
    • Foundation Models
      • The leaderboard
      • Foundation Models
      • LLama 2 - Analysis
      • Analysis of Llama 3
      • Llama 3.1 series
      • Google Gemini 1.5
      • Platypus: Quick, Cheap, and Powerful Refinement of LLMs
      • Mixtral of Experts
      • Mixture-of-Agents (MoA)
      • Phi 1.5
        • Refining the Art of AI Training: A Deep Dive into Phi 1.5's Innovative Approach
      • Phi 2.0
      • Phi-3 Technical Report
  • Training
    • The Fine Tuning Process
      • Why fine tune?
        • Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
        • Explanations in Fine Tuning
      • Tokenization
        • Tokenization Is More Than Compression
        • Tokenization - SentencePiece
        • Tokenization explore
        • Tokenizer Choice For LLM Training: Negligible or Crucial?
        • Getting the most out of your tokenizer for pre-training and domain adaptation
        • TokenMonster
      • Parameter Efficient Fine Tuning
        • P-Tuning
          • The Power of Scale for Parameter-Efficient Prompt Tuning
        • Prefix-Tuning: Optimizing Continuous Prompts for Generation
        • Harnessing the Power of PEFT: A Smarter Approach to Fine-tuning Pre-trained Models
        • What is Low-Rank Adaptation (LoRA) - explained by the inventor
        • Low Rank Adaptation (Lora)
        • Practical Tips for Fine-tuning LMs Using LoRA (Low-Rank Adaptation)
        • QLORA: Efficient Finetuning of Quantized LLMs
        • Bits and Bytes
        • The Magic behind Qlora
        • Practical Guide to LoRA: Tips and Tricks for Effective Model Adaptation
        • The quantization constant
        • QLORA: Efficient Finetuning of Quantized Language Models
        • QLORA and Fine-Tuning of Quantized Language Models (LMs)
        • ReLoRA: High-Rank Training Through Low-Rank Updates
        • SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models
        • GaLora: Memory-Efficient LLM Training by Gradient Low-Rank Projection
      • Hyperparameters
        • Batch Size
        • Padding Tokens
        • Mixed precision training
        • FP8 Formats for Deep Learning
        • Floating Point Numbers
        • Batch Size and Model loss
        • Batch Normalisation
        • Rethinking Learning Rate Tuning in the Era of Language Models
        • Sample Packing
        • Gradient accumulation
        • A process for choosing the learning rate
        • Learning Rate Scheduler
        • Checkpoints
        • A Survey on Efficient Training of Transformers
        • Sequence Length Warmup
        • Understanding Training vs. Evaluation Data Splits
        • Cross-entropy loss
        • Weight Decay
        • Optimiser
        • Caching
      • Training Processes
        • Extending the context window
        • PyTorch Fully Sharded Data Parallel (FSDP)
        • Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
        • YaRN: Efficient Context Window Extension of Large Language Models
        • Sliding Window Attention
        • LongRoPE
        • Reinforcement Learning
        • An introduction to reinforcement learning
        • Reinforcement Learning from Human Feedback (RLHF)
        • Direct Preference Optimization: Your Language Model is Secretly a Reward Model
  • INFERENCE
    • Why is inference important?
      • Grouped Query Attention
      • Key Value Cache
      • Flash Attention
      • Flash Attention 2
      • StreamingLLM
      • Paged Attention and vLLM
      • TensorRT-LLM
      • Torchscript
      • NVIDIA L40S GPU
      • Triton Inference Server - Introduction
      • Triton Inference Server
      • FiDO: Fusion-in-Decoder optimised for stronger performance and faster inference
      • Is PUE a useful measure of data centre performance?
      • SLORA
  • KNOWLEDGE
    • Vector Databases
      • A Comprehensive Survey on Vector Databases
      • Vector database management systems: Fundamental concepts, use-cases, and current challenges
      • Using the Output Embedding to Improve Language Models
      • Decoding Sentence-BERT
      • ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
      • SimCSE: Simple Contrastive Learning of Sentence Embeddings
      • Questions Are All You Need to Train a Dense Passage Retriever
      • Improving Text Embeddings with Large Language Models
      • Massive Text Embedding Benchmark
      • RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking
      • LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
      • Embedding and Fine-Tuning in Neural Language Models
      • Embedding Model Construction
      • Demystifying Embedding Spaces using Large Language Models
      • Fine-Tuning Llama for Multi-Stage Text Retrieval
      • Large Language Model Based Text Augmentation Enhanced Personality Detection Model
      • One Embedder, Any Task: Instruction-Finetuned Text Embeddings
      • Vector Databases are not the only solution
      • Knowledge Graphs
        • Harnessing Knowledge Graphs to Elevate AI: A Technical Exploration
        • Unifying Large Language Models and Knowledge Graphs: A Roadmap
      • Approximate Nearest Neighbor (ANN)
      • High Dimensional Data
      • Principal Component Analysis (PCA)
      • Vector Similarity Search - HNSW
      • FAISS (Facebook AI Similarity Search)
      • Unsupervised Dense Retrievers
    • Retrieval Augmented Generation
      • Retrieval-Augmented Generation for Large Language Models: A Survey
      • Fine-Tuning or Retrieval?
      • Revolutionising Information Retrieval: The Power of RAG in Language Models
      • A Survey on Retrieval-Augmented Text Generation
      • REALM: Retrieval-Augmented Language Model Pre-Training
      • Retrieve Anything To Augment Large Language Models
      • Generate Rather Than Retrieve: Large Language Models Are Strong Context Generators
      • Active Retrieval Augmented Generation
      • DSPy: LM Assertions: Enhancing Language Model Pipelines with Computational Constraints
      • DSPy: Compiling Declarative Language Model Calls
      • DSPy: In-Context Learning for Extreme Multi-Label Classification
      • Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
      • HYDE: Revolutionising Search with Hypothetical Document Embeddings
      • Enhancing Recommender Systems with Large Language Model Reasoning Graphs
      • Retrieval Augmented Generation (RAG) versus fine tuning
      • RAFT: Adapting Language Model to Domain Specific RAG
      • Summarisation Methods and RAG
      • Lessons Learned on LLM RAG Solutions
      • Stanford: Retrieval Augmented Language Models
      • Overview of RAG Approaches with Vector Databases
      • Mastering Chunking in Retrieval-Augmented Generation (RAG) Systems
    • Semantic Routing
    • Resource Description Framework (RDF)
  • AGENTS
    • What is agency?
      • Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves
      • Types of Agents
      • The risk of AI agency
      • Understanding Personality in Large Language Models: A New Frontier in AI Psychology
      • AI Agents - Reasoning, Planning, and Tool Calling
      • Personality and Brand
      • Agent Interaction via APIs
      • Bridging Minds and Machines: The Legacy of Newell, Shaw, and Simon
      • A Survey on Language Model based Autonomous Agents
      • Large Language Models as Agents
      • AI Reasoning: A Deep Dive into Chain-of-Thought Prompting
      • Enhancing AI Reasoning with Self-Taught Reasoner (STaR)
      • Exploring the Frontier of AI: The "Tree of Thoughts" Framework
      • Toolformer: Revolutionising Language Models with API Integration - An Analysis
      • TaskMatrix.AI: Bridging Foundational AI Models with Specialised Systems for Enhanced Task Completion
      • Unleashing the Power of LLMs in API Integration: The Rise of Gorilla
      • Andrew Ng's presentation on AI agents
      • Making AI accessible with Andrej Karpathy and Stephanie Zhan
  • Regulation and Ethics
    • Regulation and Ethics
      • Privacy
      • Detecting AI Generated content
      • Navigating the IP Maze in AI: The Convergence of Blockchain, Web 3.0, and LLMs
      • Adverse Reactions to generative AI
      • Navigating the Ethical Minefield: The Challenge of Security in Large Language Models
      • Navigating the Uncharted Waters: The Risks of Autonomous AI in Military Decision-Making
  • DISRUPTION
    • Data Architecture
      • What is a data pipeline?
      • What is Reverse ETL?
      • Unstructured Data and Generatve AI
      • Resource Description Framework (RDF)
      • Integrating generative AI with the Semantic Web
    • Search
      • BM25 - Search Engine Ranking Function
      • BERT as a reranking engine
      • BERT and Google
      • Generative Engine Optimisation (GEO)
      • Billion-scale similarity search with GPUs
      • FOLLOWIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
      • Neural Collaborative Filtering
      • Federated Neural Collaborative Filtering
      • Latent Space versus Embedding Space
      • Improving Text Embeddings with Large Language Models
    • Recommendation Engines
      • On Interpretation and Measurement of Soft Attributes for Recommendation
      • A Survey on Large Language Models for Recommendation
      • Model driven recommendation systems
      • Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
      • Foundation Models for Recommender Systems
      • Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review
      • AI driven recommendations - harming autonomy?
    • Logging
      • A Taxonomy of Anomalies in Log Data
      • Deeplog
      • LogBERT: Log Anomaly Detection via BERT
      • Experience Report: Deep Learning-based System Log Analysis for Anomaly Detection
      • Log-based Anomaly Detection with Deep Learning: How Far Are We?
      • Deep Learning for Anomaly Detection in Log Data: A Survey
      • LogGPT
      • Adaptive Semantic Gate Networks (ASGNet) for log-based anomaly diagnosis
  • Infrastructure
    • The modern data centre
      • Enhancing Data Centre Efficiency: Strategies to Improve PUE
      • TCO of NVIDIA GPUs and falling barriers to entry
      • Maximising GPU Utilisation with Kubernetes and NVIDIA GPU Operator
      • Data Centres
      • Liquid Cooling
    • Servers and Chips
      • The NVIDIA H100 GPU
      • NVIDIA H100 NVL
      • Lambda Hyperplane 8-H100
      • NVIDIA DGX Servers
      • NVIDIA DGX-2
      • NVIDIA DGX H-100 System
      • NVLink Switch
      • Tensor Cores
      • NVIDIA Grace Hopper Superchip
      • NVIDIA Grace CPU Superchip
      • NVIDIA GB200 NVL72
      • Hopper versus Blackwell
      • HGX: High-Performance GPU Platforms
      • ARM Chips
      • ARM versus x86
      • RISC versus CISC
      • Introduction to RISC-V
    • Networking and Connectivity
      • Infiniband versus Ethernet
      • NVIDIA Quantum InfiniBand
      • PCIe (Peripheral Component Interconnect Express)
      • NVIDIA ConnectX InfiniBand adapters
      • NVMe (Non-Volatile Memory Express)
      • NVMe over Fabrics (NVMe-oF)
      • NVIDIA Spectrum-X
      • NVIDIA GPUDirect
      • Evaluating Modern GPU Interconnect
      • Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)
      • Next-generation networking in AI environments
      • NVIDIA Collective Communications Library (NCCL)
    • Data and Memory
      • NVIDIA BlueField Data Processing Units (DPUs)
      • Remote Direct Memory Access (RDMA)
      • High Bandwidth Memory (HBM3)
      • Flash Memory
      • Model Requirements
      • Calculating GPU memory for serving LLMs
      • Transformer training costs
      • GPU Performance Optimisation
    • Libraries and Complements
      • NVIDIA Base Command
      • NVIDIA AI Enterprise
      • CUDA - NVIDIA GTC 2024 presentation
      • RAPIDs
      • RAFT
    • Vast Data Platform
      • Vast Datastore
      • Vast Database
      • Vast Data Engine
      • DASE (Disaggregated and Shared Everything)
      • Dremio and VAST Data
    • Storage
      • WEKA: A High-Performance Storage Solution for AI Workloads
      • Introduction to NVIDIA GPUDirect Storage (GDS)
        • GDS cuFile API
      • NVIDIA Magnum IO GPUDirect Storage (GDS)
      • Vectors in Memory
Powered by GitBook
LogoLogo

Continuum - Accelerated Artificial Intelligence

  • Continuum Website
  • Axolotl Platform

Copyright Continuum Labs - 2023

On this page
  • Dataset Creation
  • Results
  • Conclusion

Was this helpful?

  1. MODELS
  2. Foundation Models

Platypus: Quick, Cheap, and Powerful Refinement of LLMs

PreviousGoogle Gemini 1.5NextMixtral of Experts

Last updated 1 year ago

Was this helpful?

This March 2024 demonstrates the potential performance of base Large Language Models (LLMs) through parameter-efficient fine-tuning (PEFT) on a curated dataset named Open-Platypus, focusing on the domain of LLMs in customer support settings.

The authors provide a context for their work against the backdrop of significant advancements in LLMs, noting the development of models like PaLM, GPT-3, and LLaMa, which emphasise computational efficiency and the movement towards open-source models like BLOOM and Falcon.

The paper discusses various strategies to improve LLM performance, including knowledge distillation, instruction tuning, and the Mixture of Experts approach.

These methods aim to enhance the models' efficiency and adaptability across various domains. The authors specifically employ the LoRA methodology, noting its effectiveness in their workflow and its potential for future cost and time reductions in training.

Key contributions of the paper include:

Open-Platypus Dataset

Open-Platypus is a curated dataset that the team created by selecting a subset from other open datasets.

It integrates 11 open-source datasets, predominantly consisting of human-designed questions, enabling robust performance with minimal fine-tuning time and cost.

The high-quality nature of Open-Platypus has allowed for strong performance and efficiency, demonstrating the importance of targeted and specific datasets in training sophisticated models. The dataset is also released to the public, fostering collaborative improvement.

Dataset Optimisation

The authors describe their process of similarity exclusion to streamline the dataset by reducing redundancy and a training data filtering process to avoid contamination, ensuring the dataset's integrity and relevance.

Fine-tuning and Merging Process

They detail their approach to selecting and merging specialised fine-tuned LoRA modules, highlighting the effectiveness of this method in imparting specific domain knowledge while maintaining the benefits of instruction tuning.

This work aims to advance the field by providing an efficient way to enhance LLMs for specific tasks, particularly in customer support, emphasizing the potential of domain-specific datasets and merging techniques to improve model performance while reducing training time and costs.

Dataset Creation

The paper outlines a detailed process for curating the Open-Platypus dataset, aimed at enhancing the performance of Large Language Models (LLMs), particularly focusing on the STEM domain.

Here's a breakdown of the data curation process, demystifying any jargon and explaining technical concepts:

Data Selection Criteria

The curation process was influenced by several theoretical frameworks and empirical findings suggesting that with minimal yet targeted training data, significant alignment of model outputs can be achieved. The dataset aimed to provide depth in specific areas, ensuring diversity in input prompts while maintaining a manageable size.

Open-Platypus Dataset Composition

This dataset is an aggregation of 11 open-source datasets, predominantly comprising human-generated questions, with about 10% contributed by an LLM. The focus is on STEM and logic, selecting datasets that offer questions in these domains or filtering broader datasets for relevant content.

Instruction Tuning

To enhance the dataset's effectiveness, an instruction-tuning format was employed where each data point includes an instruction, input, and output. This format is particularly useful for creating structured and consistent training material for the LLM.

De-duplication and Similarity Removal

To prevent the model from simply memorizing answers, a de-duplication process was implemented. This involved removing exact duplicates and questions with a high degree of similarity (measured by cosine similarity) to others in the dataset. This step ensures that the training data encourages the model to learn underlying patterns and logic rather than memorizing specific answers.

Contamination Check

A critical part of the curation process involved ensuring that the training data did not contain any questions from benchmark test sets. This prevents the model from giving the illusion of high performance by simply recalling answers to known questions.

Fine-tuning and Merging Process

The paper also details the use of Low Rank Approximation (LoRA) for fine-tuning the models, which is a technique that adjusts a small set of parameters in the model, making the training process more efficient and cost-effective. The fine-tuning process was carefully managed to ensure that the models improved in the target domains without requiring extensive computational resources.

The meticulous curation process of Open-Platypus aims to ensure that the fine-tuned LLMs are not only effective in their domain-specific tasks but also efficient in terms of training requirements, thereby addressing a critical aspect of AI model development.

Results

Performance Overview

The Platypus2-70B instruct variant achieved the top position on the Hugging Face Open LLM Leader board with an impressive average score of 73.13, showcasing its superior performance among other models. The Stable-Platypus2-13B model was highlighted as the leading 13 billion parameter model with an average score of 63.96.

Model Merging and Fine-tuning

The study explored the effects of merging different models (broad and niche) and the benefits of fine-tuning using the Open-Platypus dataset. The results showed that the fine-tuned models outperformed the base models, particularly in the ARC and TruthfulQA benchmarks, demonstrating the effectiveness of the merging and fine-tuning strategy.

Impact on Various Benchmarks

The fine-tuned models showed varied performance across different benchmark tests. For example, the Camel-Platypus2-70B model significantly improved in the ARC-Challenge, whereas the Dolphin-Platypus2-70B merge did not surpass the performance of the base and adapter models. This indicates that the merging process's success can vary based on the models and datasets involved.

Domain-Specific Performance

The effectiveness of the fine-tuned models was domain-specific. For instance, in the machine learning domain, the Camel-Platypus2-70B model showed a remarkable improvement, suggesting that the choice of model for merging is crucial depending on the domain or task at hand.

Notable Improvements and Declines

The analysis highlighted significant improvements and declines in different domains. For example, the Camel-Platypus2-70B model excelled in the ARC-Challenge, while several models showed notable declines in the college physics test, indicating potential compatibility issues or limitations in certain domains.

Insights into Merging Strategy

The results provided insights into the merging strategy's complexity, showing that not all merges lead to superior models. The variability in performance across different benchmarks suggests that careful consideration is required when selecting models for merging, especially when targeting specific domains or tasks.

Overall, the results emphasize the potential of fine-tuning and merging strategies to enhance LLMs' performance, demonstrating significant improvements in specific domains while also highlighting the importance of domain-specific evaluations and the complexities involved in the model merging process.

Conclusion

This paper discusses the enhancement of Large Language Models (LLMs) through fine-tuning using the Open-Platypus dataset and explores the potential benefits of merging smaller, efficient models with the precision of individual adapters.

It highlights the success of these fine-tuned models in specific tasks and suggests that future work could explore integrating various datasets and methodologies like QLoRA to improve model performance.

The paper acknowledges the limitations of the Platypus model, such as its static knowledge base, potential bias, and its primary focus on English-language data.

It stresses the importance of responsible use and the need for further safety testing before deploying the model in applications. The paper also notes the significance of ensuring no contamination between training and benchmark test sets to maintain the integrity of the model's performance. Lastly, it acknowledges the contributions of Hugging Face and Meta AI in supporting the development and evaluation of LLMs.

LogoPlatypus: Quick, Cheap, and Powerful Refinement of LLMsarXiv.org
Platypus: Quick, Cheap, and Powerful Refinement of LLMs
Page cover image