Self-Alignment with Instruction Backtranslation
Xian Li, Ping Yu, Chunting Zhou, Timo Schick, Omer Levy, Luke Zettlemoyer, Jason Weston, Mike Lewis
Last updated
Copyright Continuum Labs - 2023
Xian Li, Ping Yu, Chunting Zhou, Timo Schick, Omer Levy, Luke Zettlemoyer, Jason Weston, Mike Lewis
Last updated
In this March 2024 paper, the authors introduce a novel method called "instruction backtranslation" to create high-quality instruction-following language models without relying on large amounts of human-annotated data. The approach leverages a small amount of seed data and a large web corpus to automatically generate and curate training examples.
The key steps of the instruction backtranslation method are as follows:
Self-augmentation: The seed model generates instruction prompts for web documents, creating potential training examples.
Self-curation: The seed model selects high-quality examples from the generated candidates.
Fine-tuning: The selected high-quality examples are used to fine-tune a stronger model.
Iteration: The process is repeated, using the improved model to better curate the instruction data and re-train the model.
The authors highlight the importance of data quality in aligning large language models (LLMs) for instruction following.
While human-annotated datasets are valuable, they are difficult to scale. The instruction backtranslation method addresses this challenge by leveraging the model itself to augment and curate training examples, enabling self-alignment.
The approach draws inspiration from the backtranslation method in machine translation, where target sentences are automatically annotated with model-generated source sentences in another language. In this case, the model generates instruction prompts for web documents and selects high-quality (instruction, output) pairs for training.
The authors demonstrate the effectiveness of their approach by fine-tuning LLaMa on two iterations of instruction backtranslation. The resulting model, named Humpback, outperforms all other non-distilled models on the Alpaca leaderboard, showcasing the power of self-alignment through iterative self-augmentation and self-curation.
The instruction backtranslation method consists of two main steps: self-augmentation and self-curation. The process is iterative, allowing the model to improve its ability to select high-quality examples for fine-tuning. Let's break down each step in detail:
Initialization
Start with a base language model (e.g., LLaMa), a small seed dataset of human-annotated (instruction, output) pairs, and a large unlabeled web corpus.
Preprocess the web corpus by extracting self-contained segments, deduplicating, filtering by length, and removing low-quality segments.
Self-Augmentation
Fine-tune the base language model on (output, instruction) pairs from the seed data to create a backward model Myx, which predicts instructions given outputs.
For each unlabeled example yi in the web corpus, use the backward model to generate a candidate instruction ˆxi.
Create candidate augmented paired data A := {(ˆxi, yi)} by combining the generated instructions with their corresponding outputs.
Self-Curation
Start with a seed instruction model M0 fine-tuned on (instruction, output) pairs from the seed data.
Use M0 to score each augmented example (ˆxi, yi) in A and derive a quality score ai using prompting (e.g., instructing the model to rate the quality on a 5-point scale).
Select a subset of the augmented examples with scores ai ≥ k to form a curated set A(1)k.
Iterative Self-Curation
Use the curated augmentation data A(t-1)k from the previous iteration, along with the seed data, to fine-tune an improved model Mt.
Use Mt to rescore the augmented examples for quality, resulting in a new augmentation set A(t)k.
Perform multiple iterations of data selection and fine-tuning to obtain the final model (e.g., M2 after two iterations).
When combining seed data and augmented data for fine-tuning, use tagging to distinguish the data sources (e.g., append "Answer in the style of an AI Assistant." for seed data and "Answer with knowledge from web search." for augmented data).
Example of emulating the process
Start with a base model like GPT-3 and a small seed dataset of human-annotated (instruction, output) pairs, along with a large web corpus like Common Crawl.
Fine-tune GPT-3 on (output, instruction) pairs from the seed data to create a backward model that predicts instructions given outputs.
For each document in the web corpus, extract self-contained segments and use the backward model to generate candidate instructions for each segment.
Create candidate augmented paired data by combining the generated instructions with their corresponding segments.
Fine-tune a seed instruction model (e.g., GPT-3) on the (instruction, output) pairs from the seed data.
Use the seed instruction model to score each augmented example using prompting (e.g., "On a scale of 1 to 5, how well does the output answer the given instruction?").
Select a subset of the augmented examples with scores above a certain threshold (e.g., 4 or 5) to form a curated set.
Fine-tune the seed instruction model on the curated set, along with the seed data, to create an improved model.
Use the improved model to rescore the augmented examples and create a new curated set.
Repeat steps 8 and 9 for multiple iterations to obtain the final instruction-following model.
By emulating this process, you can leverage large amounts of unlabeled data to create high-quality instruction-following models without relying heavily on human annotation.
The experiments in this paper aim to evaluate the effectiveness of the proposed instruction backtranslation method for training instruction-following language models. The authors conducted several experiments to analyze the impact of data quality, data quantity, and various ablations. Let's break down the experiments in detail:
Seed data: 3,200 high-quality (instruction, output) pairs from the first turn of the Open Assistant dataset.
Base model: LLaMA with 7B, 33B, and 65B parameters, fine-tuned using the same hyperparameters as existing supervised fine-tuning methods.
Unlabeled data: 502k segments from the English portion of the Clueweb corpus.
Baselines: text-davinci-003, LIMA, and Guanaco.
Evaluation: 1,130 unique prompts from various sources, with a dev set of 256 prompts. Automatic evaluation using AlpacaEval and human preference evaluation.
Analysis of instruction and output lengths for seed data, self-augmented data, and self-curated data.
Task diversity analysis using the verb-noun structure of instructions.
Data quality vs. data quantity: Fine-tuning on augmented data of different quality (without curation, A(2)4, and A(2)5) to understand the importance of data quality.
Data scaling efficiency: Comparing the performance of various instruction-following models as the amount of fine-tuning data changes. Estimating the scaling coefficient α for different instruction datasets.
AlpacaEval: Evaluating the generation quality using GPT-4 as the judge, comparing Humpback to non-distilled, distilled, and proprietary models.
Human Evaluation: Pairwise comparison of Humpback with open-source and proprietary models using human preference judgments.
Commonsense Reasoning and MMLU: Zero-shot accuracy on five commonsense reasoning benchmarks and the Massive Multitask Language Understanding (MMLU) benchmark.
Training on self-augmented data only: Comparing the performance of models trained on self-augmented data with and without self-curation, and jointly fine-tuning with seed data.
System prompts: Analyzing the effects of using system prompts to distinguish augmented data from seed data during fine-tuning and inference.
The experiments demonstrate that the proposed instruction backtranslation method, using self-augmentation and self-curation, can effectively leverage large amounts of unlabeled data to create high-quality instruction-following models. The results show that Humpback outperforms other non-distilled models and achieves competitive performance compared to distilled and proprietary models. The ablation studies further confirm the importance of self-curation and the complementary nature of seed data and augmented data.
The related work section discusses various approaches to instruction tuning for large language models (LLMs) and the challenges in gathering high-quality demonstration examples for fine-tuning.
Early work on instruction tuning focused on NLP tasks, showing that fine-tuning with instruction-output pairs improves cross-task generalization. Recent work has extended instruction tuning to a broader range of general tasks, incorporating instructions from LLM users.
Existing high-quality instruction-following LLMs rely on human annotations, which are expensive and time-consuming to collect. Some works have explored using LLMs to generate instructions, such as Unnatural Instructions, Self-Instruct, and the concurrent work by Köksal et al. (2023). However, these approaches either use model-generated responses for training data or rely on distillation from a more powerful model.
The authors also discuss self-alignment, where the model is utilized to improve itself and align its responses with desired behaviors. Many of these works construct training data in an unsupervised way or use the model to generate additional context to condition on at inference time.
The importance of data quality is highlighted, with approaches like PALMS and LIMA showing that curating high-quality human-written data results in strong performance. The concurrent work by Chen et al. (2023) provides an algorithmic approach to select high-quality data.
Finally, the authors discuss distillation, where most fine-tuned LLaMA models are based on knowledge distillation from ChatGPT or GPT-4. These approaches require an already strong model and do not provide a recipe for building a strong model from scratch.
In conclusion, the proposed instruction backtranslation method offers a scalable approach to fine-tuning large language models for instruction following. By leveraging large amounts of unlabeled data and using the model itself to augment and curate high-quality training examples, this iterative self-training algorithm enables the creation of strong instruction-following models without relying heavily on human annotations or distillation from more powerful models.
The experiments demonstrate that the Humpback models, fine-tuned using instruction backtranslation, outperform all other non-distilled instruction-following models on the Alpaca leaderboard while using fewer human-annotated examples. This showcases the effectiveness of the self-augmentation and self-curation steps in improving the model's performance.
The analysis suggests that scaling this method further by considering larger unlabeled corpora could yield even greater gains. As the field of instruction tuning for LLMs continues to evolve, the instruction backtranslation approach presents a promising direction for creating high-quality, general-purpose instruction-following models in a more efficient and cost-effective manner.
Future research should explore the application of this method to larger datasets, investigate ways to further refine the self-curation process, and examine the potential for combining instruction backtranslation with other techniques, such as self-alignment and data quality optimization. By continuing to develop and improve methods like instruction backtranslation, researchers can work towards creating more capable and versatile language models that can better understand and follow a wide range of instructions.