Stage 02

Parameter-Efficient Fine-Tuning

Master LoRA, QLoRA, adapters, and prompt tuning — train LLMs with 100x fewer parameters.

10notebooks
8hestimated
2045 min

LoRA Theory

Understand LoRA's low-rank matrix decomposition: W = W₀ + BA·(α/r). Implement from scratch.

LoRALow-Rank DecompositionRankAlpha Scaling
2160 min

LoRA on LLaMA-2 7B

Apply LoRA to LLaMA-2 7B using HuggingFace PEFT. Train with 0.65% of parameters.

LLaMA-2PEFTLoRAInstruction Tuning
2260 min

QLoRA Implementation

4-bit NF4 quantization + LoRA. Train 7B models on a single consumer GPU with bitsandbytes.

QLoRANF44-bit Quantizationbitsandbytes+1
2340 min

LoRA Target Modules

Ablation study: which layers to apply LoRA to? q_proj, v_proj, all attention, or FFN?

Target ModulesAblation StudyLayer SelectionParameter Budget
2440 min

Custom Loss with LoRA

Combine custom loss functions (focal, weighted CE) with LoRA training. Verify gradient flow.

Custom LossGradient FlowLoRA TrainerWeightedCELoRATrainer
2535 min

Adapter Layers

Classic bottleneck adapter architecture — residual connection with down/up projection.

AdaptersBottleneckResidual ConnectionPEFT
2635 min

Prompt Tuning

Optimize soft prompt embeddings while freezing the entire model. Only 0.01% parameters trained.

Prompt TuningSoft PromptsVirtual TokensPrefix Tuning
2745 min

PEFT Method Comparison

Benchmark LoRA, QLoRA, adapters, and prompt tuning on accuracy, parameters, and speed.

PEFT ComparisonBenchmarkingParameter EfficiencyQuality Tradeoffs
2840 min

LoRA Merging

Merge LoRA adapters back into base weights with merge_and_unload(). Combine multiple adapters.

LoRA Mergingmerge_and_unloadTIES MergingDARE
2950 min

Advanced LoRA Variants

DoRA (magnitude-direction decomposition), AdaLoRA (adaptive rank), and LoRA+ (optimized learning rates).

DoRAAdaLoRALoRA+Adaptive Rank+1
← Previous
Full Model Fine-Tuning
Next →
Advanced Optimization