ML Wiki

Tag: preference-learning

2 items with this tag.

  • Apr 17, 2026

    KTO: Model Alignment as Prospect Theoretic Optimization

    • source
    • alignment
    • dpo
    • rlhf
    • preference-learning
  • Apr 04, 2026

    Direct Preference Optimization: Your Language Model is Secretly a Reward Model

    • source
    • alignment
    • preference-learning
    • reward-modeling
    • policy-optimization
    • human-feedback