What is Low-Rank Adaptation (LoRA) - explained by the inventor
Edward Hu
Edward Hu, formerly a researcher at Microsoft, introduces Low Rank Adaptation (LoRa), a method to efficiently customise pretrained neural networks like diffusion models or language models.
LoRa enhances training speed and significantly reduces checkpoint sizes by fine-tuning a minimal number of parameters, maintaining the performance level of comprehensive fine-tuning.
Originating in early 2021 during Microsoft's collaboration with OpenAI, LoRa addressed the limitations of few-shot prompting and the prohibitive expense of full fine-tuning, especially for models with vast numbers of parameters.
LoRa operates by questioning the necessity and extent of parameter fine-tuning, using a 2D plane to illustrate the range of possible configurations. It controls matrix update expressivity through rank limitation, enabling substantial parameter reduction without compromising transformation capabilities.
This method proved nearly as effective as full fine-tuning but with vastly smaller storage and quicker deployment benefits.
LoRa's applicability extends beyond language models to any architecture involving matrix multiplication, offering a clear path for adjustment if underperformance occurs.
It significantly cuts down checkpoint sizes (e.g., reducing a 1 TB checkpoint to 25 MB) and allows for additional low-rank matrices without introducing inference latency.
The updates are additive, allowing seamless model switching and parallel training of multiple modules. LoRa's additive nature also supports hierarchical model specialization, enabling efficient, layered fine-tuning and rapid task or user-specific model adaptation.

Last updated
Was this helpful?