ML Wiki
Search
Search
Explorer
Tag: dpo
2 items with this tag.
Apr 22, 2026
KTO: Model Alignment as Prospect Theoretic Optimization
source
alignment
rlhf
dpo
reward-model
training
Apr 17, 2026
KTO: Model Alignment as Prospect Theoretic Optimization
source
alignment
dpo
rlhf
preference-learning