63Stage 0745 min

Preference Algorithm Comparison

DPO vs ORPO vs KTO vs IPO vs SimPO — which alignment algorithm to use when. Side-by-side comparison on the same dataset.

DPOORPOKTOIPOSimPOAlignment

Overview

This notebook covers preference comparison in depth, providing hands-on implementation and conceptual understanding. Open the notebook in Google Colab using the button above to run all code interactively on a free GPU.

What You'll Build

By completing this notebook, you'll implement a working version of the concepts from scratch and understand how they connect to the broader LLM training pipeline.

Prerequisites

Complete the previous notebook in the stage before starting this one. Each notebook builds on concepts from the previous session.

Open in Colab

Click the "Open in Google Colab" button above to launch this notebook. Make sure to switch the runtime to GPU (Runtime → Change runtime type → T4 GPU) before running cells.

Key Takeaways

  • 01Understand the core concepts behind preference comparison
  • 02Implement a working version from scratch in PyTorch
  • 03Connect this technique to real-world LLM training pipelines
  • 04Know when to use this approach vs alternatives
  • 05Debug common errors and edge cases