The advent of Large Language Models (LLMs) like GPT, T5, and BERT, rooted in the transformer architecture, has revolutionized the field of Natural Language Processing (NLP). Recently, these models have also started to venture into other domains, such as Computer Vision (CV) and Audio, with technologies like VIT, Stable Diffusion, LayoutLM, Whisper, and XLS-R leading the charge. The standard approach in employing these models is to pretrain them on a large-scale, generic dataset and subsequently fine-tune them for specific downstream tasks.
Fine-tuning is an impressive technique. When you adjust these pretrained LLMs on downstream datasets, you can achieve significant performance enhancements compared to using pretrained LLMs right off the bat (zero-shot inference). However, as models continue to grow, full fine-tuning becomes unfeasible on standard consumer hardware. Moreover, the storage and deployment of independently fine-tuned models for each downstream task can be quite expensive, as fine-tuned models are the same size as the original pretrained model.
This is where Parameter-Efficient Fine-tuning (PEFT) approaches come into play, offering solutions to these challenges!
Introducing PEFT Approaches
PEFT techniques only fine-tune a small number of model parameters while keeping most parameters of the pretrained LLMs unchanged, significantly reducing computational and storage costs. By doing so, these methods overcome issues related to catastrophic forgetting, a behaviour observed during the full fine-tuning of LLMs. PEFT approaches have also shown superiority in the low-data regimes and generalize better to out-of-domain scenarios. They can be applied to various modalities, including image classification and stable diffusion dreambooth.
PEFT enhances model portability. Users can use PEFT methods to generate tiny checkpoints worth a few MBs, compared to the large checkpoints of full fine-tuning. For example, bigscience/mt0-xxl takes up 40GB of storage. Full fine-tuning would lead to 40GB checkpoints for each downstream dataset, whereas PEFT methods would require only a few MBs for each downstream dataset, while achieving performance comparable to full fine-tuning. The small trained weights from PEFT approaches are added on top of the pretrained LLM, enabling the same LLM to be used for multiple tasks without replacing the entire model.
In a nutshell, PEFT approaches allow you to achieve performance comparable to full fine-tuning, while only requiring a small number of trainable parameters.
Announcing the 🤗 PEFT Library
Today, we are thrilled to introduce the 🤗 PEFT library, a product of the seamless integration of 🤗 Transformers and 🤗 Accelerate. This library allows you to utilize the most popular and efficient models from Transformers, along with the simplicity and scalability of Accelerate. Here are some of the currently supported PEFT methods:
LoRA: LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS
Prefix Tuning: P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks
Prompt Tuning: The Power of Scale for Parameter-Efficient Prompt Tuning
P-Tuning: GPT Understands, Too
Exciting Use Cases
We explore a range of exciting use cases in this library. Some of the most interesting ones include:
Using 🤗 PEFT LoRA for tuning bigscience/T0_3B model (3 Billion parameters) on consumer hardware with 11GB of RAM, such as Nvidia GeForce RTX 2080 Ti, Nvidia GeForce RTX 3080, etc., using 🤗 Accelerate's DeepSpeed integration.
Enabling INT8 tuning of the OPT-6.7b model (6.7 Billion parameters) in Google Colab using 🤗 PEFT LoRA and bitsandbytes.
Training Stable Diffusion Dreambooth using 🤗 PEFT on consumer hardware with 11GB of RAM, such as Nvidia GeForce RTX 2080 Ti, Nvidia GeForce RTX 3080, etc. Try out the Space demo, which should run seamlessly on a T4 instance (16GB GPU).
Training your Model Using 🤗 PEFT
To illustrate the application of PEFT, let's consider the case of fine-tuning bigscience/mt0-large using LoRA.
First, we import the necessary libraries and create a config corresponding to the PEFT method. We then wrap the base 🤗 Transformers model by calling get_peft_model. After training the model, we can save it for inference. This will only save the incremental PEFT weights that were trained, significantly reducing storage space.
In the coming months, we'll be exploring more PEFT methods and focusing on new use cases such as INT8 training of the whisper-large model in Google Colab and tuning of RLHF components.
We're excited to see how industry practitioners apply PEFT to their use cases. If you have any questions or feedback, feel free to open an issue on our GitHub repo 🤗. Happy Parameter-Efficient Fine-Tuning!
Comments