ML Wiki

Tag: dpo

5 items with this tag.

May 05, 2026
Self-Rewarding Language Models
Apr 29, 2026
ORPO: Monolithic Preference Optimization without Reference Model
Apr 28, 2026
What's the right alignment stack post-RLHF?
Apr 22, 2026
KTO: Model Alignment as Prospect Theoretic Optimization
Apr 17, 2026
KTO: Model Alignment as Prospect Theoretic Optimization