Why Environment Setup Matters
The most common cause of failed LLM training runs is not the algorithm — it's the environment. Wrong CUDA version, missing libraries, or insufficient GPU memory can silently corrupt results or cause cryptic errors hours into a training run.
GPU Requirements by Task
| Task | Minimum VRAM | Recommended |
|---|---|---|
| Fine-tune 7B (QLoRA) | 12 GB | 24 GB |
| Fine-tune 13B (QLoRA) | 16 GB | 40 GB |
| Full fine-tune GPT-2 | 4 GB | 8 GB |
| Inference 7B (4-bit) | 6 GB | 8 GB |
Colab GPU Tiers
Google Colab provides free access to NVIDIA T4 (15 GB VRAM) GPUs. Colab Pro adds A100 (40/80 GB) access. For this curriculum, a T4 is sufficient through Stage 4. Stage 5 kernels benefit from A100 for Flash Attention and TensorRT-LLM.
Essential Checks Before Every Session
import torch
# 1. Verify GPU
assert torch.cuda.is_available(), "No GPU — switch to GPU runtime"
# 2. Check VRAM
free, total = torch.cuda.mem_get_info()
print(f"Free: {free/1e9:.1f} GB / {total/1e9:.1f} GB")
# 3. Clear stale memory
torch.cuda.empty_cache()
import gc; gc.collect()
HuggingFace Authentication
Models like LLaMA-2, Mistral, and Gemma are "gated" — you must accept the license on HuggingFace Hub before downloading. Set your token as a Colab secret named HF_TOKEN to avoid exposing it in notebook cells.