ML Wiki
Search
Search
Explorer
Tag: preference-learning
1 item with this tag.
Apr 04, 2026
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
source
alignment
preference-learning
reward-modeling
policy-optimization
human-feedback