ML Wiki
Search
Search
Explorer
Tag: reward-model
2 items with this tag.
Apr 22, 2026
KTO: Model Alignment as Prospect Theoretic Optimization
source
alignment
rlhf
dpo
reward-model
training
Apr 17, 2026
Learning to Summarize from Human Feedback
source
rlhf
alignment
reward-model
summarization