ML Wiki
Search
Search
Explorer
Tag: preference-learning
2 items with this tag.
Apr 17, 2026
KTO: Model Alignment as Prospect Theoretic Optimization
source
alignment
dpo
rlhf
preference-learning
Apr 04, 2026
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
source
alignment
preference-learning
reward-modeling
policy-optimization
human-feedback