# The Power of Scale for Parameter-Efficient Prompt Tuning

This highly cited <mark style="color:blue;">**September 2021**</mark> paper introduced "prompt tuning," a method for adapting large pre-trained language models to perform specific downstream tasks by learning "soft prompts" that condition the model's behaviour.&#x20;

Prompt tuning was one of the first concepts around parameter efficient fine tuning

{% embed url="<https://arxiv.org/abs/2104.08691>" %}
The Power of Scale for Parameter-Efficient Prompt Tuning
{% endembed %}

Instead of using discrete text prompts like GPT-3, the authors propose learning continuous "soft prompts" through backpropagation.  These soft prompts can incorporate signals from labeled examples and outperform GPT-3's few-shot learning.

Through experiments with the T5 model, the authors show that prompt tuning becomes more competitive with model tuning (where all model weights are tuned) as the model size increases.  With billion-parameter models, *<mark style="color:yellow;">**prompt tuning can match the performance of model tuning.**</mark>*

Prompt tuning is *<mark style="color:yellow;">**more parameter-efficient than model tuning**</mark>*, as a single frozen model can be reused for multiple downstream tasks by learning task-specific prompts. This is especially beneficial for large models that are costly to share and serve.

The authors compare prompt tuning to similar approaches like "prefix tuning" (Li and Liang, 2021) and show that prompt tuning alone, without intermediate-layer prefixes or task-specific output layers, is sufficient to be competitive with model tuning.

Prompt tuning has additional benefits, such as better resilience to domain shifts compared to model tuning, and the ability to perform efficient <mark style="color:blue;">**"prompt ensembling"**</mark> by learning multiple prompts for the same task.

### <mark style="color:purple;">Key Features of Prompt Tuning</mark>

#### <mark style="color:green;">Parameter efficiency</mark>

Prompt tuning is highly parameter-efficient compared to other methods. It requires less than 0.01% task-specific parameters for models over a billion parameters, making it the most parameter-efficient among methods with learnable parameters.&#x20;

In contrast, model tuning requires a separate copy of the entire model for each task, and adapter-based methods like prefix tuning and WARP involve more parameters.

#### <mark style="color:green;">Continuous vs. discrete prompts</mark>

Unlike the discrete text prompts used by GPT-3, prompt tuning learns continuous "soft prompts" through backpropagation. These soft prompts can incorporate signals from labeled examples and outperform GPT-3's few-shot learning.

#### <mark style="color:green;">Prompt location</mark>

Prompt tuning *<mark style="color:yellow;">**prepends the soft prompts to the input embeddings,**</mark>* while other methods like prefix tuning (Li and Liang, 2021) prepend prompts at every transformer layer.&#x20;

This allows prompt tuning to modify the input representations directly, letting the model update intermediate-layer task representations based on the input example.

#### <mark style="color:green;">Frozen language model</mark>

Prompt tuning keeps the pre-trained language model frozen and only tunes the soft prompts. This prevents the model from overfitting to specific datasets by memorising spurious correlations, leading to improved robustness to domain shifts compared to model tuning.

#### <mark style="color:green;">Efficient ensembling</mark>

Prompt tuning enables efficient "prompt ensembling" by learning multiple prompts for the same task while sharing the core language model parameters. This improves performance and reduces storage and inference costs compared to traditional model ensembling.

<details>

<summary><mark style="color:blue;">What is ensembling?</mark></summary>

Prompt ensembling is a technique that involves training multiple sets of soft prompts for the same task using a single frozen pre-trained language model.&#x20;

Each set of prompts can be viewed as a separate "model" that adapts the language model to the specific task.&#x20;

By combining the predictions from these multiple prompt-based models, we can create an ensemble that often outperforms individual prompt-based models and matches the performance of the best single prompt.

<mark style="color:green;">**The main advantages of prompt ensembling are:**</mark>

<mark style="color:blue;">**Improved performance:**</mark> Ensembling multiple prompts leads to better task performance compared to the average single prompt and often matches or exceeds the best individual prompt.

<mark style="color:blue;">**Parameter efficiency:**</mark> Prompt ensembling allows for the creation of multiple task-specific models while sharing the same core language model parameters. This drastically reduces storage costs compared to traditional model ensembling, where each model in the ensemble is a separate copy of the entire model.

<mark style="color:blue;">**Inference efficiency:**</mark> During inference, instead of running multiple forward passes for each model in the ensemble, we can process the input with a single forward pass using a batch size equal to the number of prompts in the ensemble. This makes inference more efficient compared to traditional model ensembling.

<mark style="color:green;">**Here's an example of how prompt ensembling works**</mark>

Let's say we have a sentiment analysis task where we need to classify movie reviews as positive or negative.&#x20;

We start by training five different sets of soft prompts (P1, P2, P3, P4, P5) on the same training data using a single frozen pre-trained language model (e.g., T5-XXL).

During inference, given a new movie review, we prepend each set of prompts to the input and run a single forward pass with a batch size of five. This gives us five different sentiment predictions, one for each prompt:

* P1: Positive
* P2: Positive
* P3: Negative
* P4: Positive
* P5: Positive

To get the final ensemble prediction, we can use a simple majority voting scheme. In this case, four out of five prompts predict "Positive," so the final ensemble prediction is "Positive."

Another example is in question answering tasks, such as SQuAD.&#x20;

We can train multiple sets of prompts on the SQuAD dataset and use them to generate multiple answers for a given question.&#x20;

The ensemble prediction can be obtained by combining the answers generated by each prompt, either by voting or by taking the answer with the highest average confidence score.

Prompt ensembling is a technique that leverages the parameter efficiency of prompt tuning to create multiple task-specific models while *<mark style="color:yellow;">**sharing the same core language model**</mark>*.&#x20;

This allows for improved performance, reduced storage costs, and efficient inference compared to traditional model ensembling.

</details>

#### <mark style="color:green;">Interpretability</mark>

Although the learned soft prompts are less interpretable than discrete text prompts, the authors find that the nearest neighbours of prompt tokens form semantic clusters, suggesting that the prompts learn "word-like" representations.

### <mark style="color:purple;">Summary</mark>

In summary, prompt tuning is a simple yet effective method for adapting large pre-trained language models to downstream tasks.&#x20;

By learning continuous soft prompts through backpropagation, prompt tuning can match the performance of model tuning while being more parameter-efficient and enabling the reuse of a single frozen model for multiple tasks.&#x20;

The effectiveness of prompt tuning increases with model scale, making it a promising approach for efficiently leveraging large language models in various applications.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://training.continuumlabs.ai/training/the-fine-tuning-process/parameter-efficient-fine-tuning/p-tuning/the-power-of-scale-for-parameter-efficient-prompt-tuning.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
