The Problem LoRA Solves
Full fine-tuning a 7B parameter model requires storing: the model itself (~14 GB in FP16), gradients (~14 GB), optimizer state (~56 GB for AdamW). Total: ~84 GB — beyond most single-GPU setups. LoRA reduces trainable parameters by 10,000x while matching full fine-tuning quality.
Mathematical Intuition
The hypothesis: the fine-tuning weight update ΔW lives in a low-dimensional subspace of the full weight space. Even for a 4096×4096 weight matrix, the actual meaningful change during fine-tuning has an intrinsic rank much lower than 4096. LoRA exploits this by parameterizing ΔW = BA where both B and A are thin matrices.
Why B=0 Initialization?
B is initialized to zero so that at the start of training, the LoRA update ΔW = BA = 0, leaving the pre-trained model unchanged. This ensures stable training from a good initialization point rather than a random perturbation.
Applying LoRA with HuggingFace PEFT
from peft import LoraConfig, get_peft_model, TaskType
config = LoraConfig(
r=8, # rank — higher = more capacity
lora_alpha=16, # scaling factor α
target_modules=["q_proj", "v_proj"], # which layers to adapt
lora_dropout=0.05,
bias="none",
task_type=TaskType.CAUSAL_LM,
)
model = get_peft_model(base_model, config)
model.print_trainable_parameters()
# trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.0622
Rank Selection Guide
- r=4: Minimal capacity. Good for simple style/format adaptation.
- r=8: Standard. Best quality/parameter tradeoff for most tasks.
- r=16-32: Higher capacity. For complex task adaptation or small datasets.
- r=64+: Approaches full fine-tuning. Use if LoRA quality is insufficient.