Master LoRA, QLoRA, adapters, and prompt tuning — train LLMs with 100x fewer parameters.
Understand LoRA's low-rank matrix decomposition: W = W₀ + BA·(α/r). Implement from scratch.
Apply LoRA to LLaMA-2 7B using HuggingFace PEFT. Train with 0.65% of parameters.
4-bit NF4 quantization + LoRA. Train 7B models on a single consumer GPU with bitsandbytes.
Ablation study: which layers to apply LoRA to? q_proj, v_proj, all attention, or FFN?
Combine custom loss functions (focal, weighted CE) with LoRA training. Verify gradient flow.
Classic bottleneck adapter architecture — residual connection with down/up projection.
Optimize soft prompt embeddings while freezing the entire model. Only 0.01% parameters trained.
Benchmark LoRA, QLoRA, adapters, and prompt tuning on accuracy, parameters, and speed.
Merge LoRA adapters back into base weights with merge_and_unload(). Combine multiple adapters.
DoRA (magnitude-direction decomposition), AdaLoRA (adaptive rank), and LoRA+ (optimized learning rates).