LLaMA-Factory

Fine-tuning a large language model can be easy as...

2024-04-23

LLaMA-Factory is a comprehensive and efficient framework designed for fine-tuning large language models (LLMs). It supports a wide range of models including LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen, DeepSeek, Yi, Gemma, ChatGLM, and more. The framework integrates multiple training methods such as pre-training, supervised fine-tuning, reward modeling, PPO, DPO, KTO, and ORPO. It also supports scalable resources like 16-bit full-tuning, freeze-tuning, LoRA, and various QLoRA techniques (2/3/4/5/6/8-bit) via AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ.

Advanced algorithms like GaLore, BAdam, APOLLO, Adam-mini, Muon, DoRA, LongLoRA, LLaMA Pro, and Mixture-of-Depths are incorporated to enhance training efficiency. Practical tricks such as FlashAttention-2, Unsloth, Liger Kernel, RoPE scaling, NEFTune, and rsLoRA are also supported. The framework is versatile, handling tasks like multi-turn dialogue, tool usage, image understanding, visual grounding, video recognition, and audio understanding.

LLaMA-Factory provides experiment monitoring tools like LlamaBoard, TensorBoard, Wandb, MLflow, and SwanLab. It ensures faster inference with OpenAI-style API, Gradio UI, and CLI using vLLM or SGLang workers. The framework is optimized for performance, offering up to 3.7 times faster training speed compared to other methods, with better Rouge scores and reduced GPU memory usage through 4-bit quantization techniques.

With continuous updates and support for the latest models like Qwen3, Llama 3, GLM-4, and Mistral Small, LLaMA-Factory is a cutting-edge tool for researchers and developers working with LLMs. It is licensed under Apache-2.0 and encourages community contributions and collaborations.

Artificial Intelligence Large Language Models Fine-Tuning Machine Learning Natural Language Processing