ML Wiki

Tag: reward-modeling

1 item with this tag.

  • Apr 04, 2026

    Direct Preference Optimization: Your Language Model is Secretly a Reward Model

    • source
    • alignment
    • preference-learning
    • reward-modeling
    • policy-optimization
    • human-feedback