The Power of Scale for Parameter-Efficient Prompt Tuning
Last updated
Copyright Continuum Labs - 2023
Last updated
This highly cited September 2021 paper introduced "prompt tuning," a method for adapting large pre-trained language models to perform specific downstream tasks by learning "soft prompts" that condition the model's behaviour.
Prompt tuning was one of the first concepts around parameter efficient fine tuning
Instead of using discrete text prompts like GPT-3, the authors propose learning continuous "soft prompts" through backpropagation. These soft prompts can incorporate signals from labeled examples and outperform GPT-3's few-shot learning.
Through experiments with the T5 model, the authors show that prompt tuning becomes more competitive with model tuning (where all model weights are tuned) as the model size increases. With billion-parameter models, prompt tuning can match the performance of model tuning.
Prompt tuning is more parameter-efficient than model tuning, as a single frozen model can be reused for multiple downstream tasks by learning task-specific prompts. This is especially beneficial for large models that are costly to share and serve.
The authors compare prompt tuning to similar approaches like "prefix tuning" (Li and Liang, 2021) and show that prompt tuning alone, without intermediate-layer prefixes or task-specific output layers, is sufficient to be competitive with model tuning.
Prompt tuning has additional benefits, such as better resilience to domain shifts compared to model tuning, and the ability to perform efficient "prompt ensembling" by learning multiple prompts for the same task.
Prompt tuning is highly parameter-efficient compared to other methods. It requires less than 0.01% task-specific parameters for models over a billion parameters, making it the most parameter-efficient among methods with learnable parameters.
In contrast, model tuning requires a separate copy of the entire model for each task, and adapter-based methods like prefix tuning and WARP involve more parameters.
Unlike the discrete text prompts used by GPT-3, prompt tuning learns continuous "soft prompts" through backpropagation. These soft prompts can incorporate signals from labeled examples and outperform GPT-3's few-shot learning.
Prompt tuning prepends the soft prompts to the input embeddings, while other methods like prefix tuning (Li and Liang, 2021) prepend prompts at every transformer layer.
This allows prompt tuning to modify the input representations directly, letting the model update intermediate-layer task representations based on the input example.
Prompt tuning keeps the pre-trained language model frozen and only tunes the soft prompts. This prevents the model from overfitting to specific datasets by memorising spurious correlations, leading to improved robustness to domain shifts compared to model tuning.
Prompt tuning enables efficient "prompt ensembling" by learning multiple prompts for the same task while sharing the core language model parameters. This improves performance and reduces storage and inference costs compared to traditional model ensembling.
Although the learned soft prompts are less interpretable than discrete text prompts, the authors find that the nearest neighbours of prompt tokens form semantic clusters, suggesting that the prompts learn "word-like" representations.
In summary, prompt tuning is a simple yet effective method for adapting large pre-trained language models to downstream tasks.
By learning continuous soft prompts through backpropagation, prompt tuning can match the performance of model tuning while being more parameter-efficient and enabling the reuse of a single frozen model for multiple tasks.
The effectiveness of prompt tuning increases with model scale, making it a promising approach for efficiently leveraging large language models in various applications.