# Optimiser

The choice of optimiser can impact the speed and stability of the fine-tuning process.&#x20;

Popular optimisers for fine-tuning include Adam, AdamW, and SGD with momentum.&#x20;

Each optimiser has its hyperparameters, such as momentum and weight decay rate, which may need to be adjusted based on the specific task and model.

The AdamW optimiser is a variant of the standard Adam optimiser, widely used in training deep neural networks. <mark style="color:yellow;">AdamW is particularly effective for large models</mark> due to its handling of weight decay.

* Adam stands for "Adaptive Moment Estimation".  It combines the advantages of two other popular optimizers: AdaGrad and RMSProp. Adam maintains a learning rate for each network parameter (weight) and adapts these rates based on the average of the second moments of the gradients (how fast the loss is changing) and the first moments (the direction and magnitude of the change).

### <mark style="color:purple;">What is the AdamW optimiser?</mark>

AdamW is an optimisation algorithm that is based on the popular Adam optimiser, which is commonly used for deep learning applications.

The "W" in AdamW stands for "Weight decay," which is a technique used to prevent overfitting in machine learning models.

Weight decay involves adding a regularisation term to the loss function that penalises large weight values.  This helps to prevent the model from overfitting to the training data by encouraging it to learn simpler patterns that generalise better to new data.

The AdamW optimiser extends the Adam optimiser by adding weight decay directly to the update rule for the model parameters. This has been shown to improve the performance of deep learning models, particularly in cases where the data is noisy, or the model architecture is complex.

In addition to weight decay, AdamW also includes a bias correction term that is used to correct for bias in the estimates of the first and second moments of the gradients. This helps to ensure that the optimiser can converge more quickly and reliably to the optimal solution.

Overall, the AdamW optimiser is a powerful tool for optimising deep learning models, and it has been shown to be effective in a wide range of applications. It is commonly used for tasks such as image classification, object detection, and natural language processing.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://training.continuumlabs.ai/training/the-fine-tuning-process/hyperparameters/optimiser.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
