ML Wiki

Tag: reward-modeling

1 item with this tag.

Apr 04, 2026
Direct Preference Optimization: Your Language Model is Secretly a Reward Model