ML Wiki

Tag: rlhf

12 items with this tag.

May 05, 2026
Self-Rewarding Language Models
Apr 29, 2026
ORPO: Monolithic Preference Optimization without Reference Model
Apr 28, 2026
What's the right alignment stack post-RLHF?
Apr 25, 2026
Learning to Summarize from Human Feedback
Apr 22, 2026
KTO: Model Alignment as Prospect Theoretic Optimization
Apr 21, 2026
GPT-4 Technical Report
Apr 17, 2026
Constitutional AI: Teaching Models to Self-Correct
Apr 17, 2026
KTO: Model Alignment as Prospect Theoretic Optimization
Apr 17, 2026
Llama 2: Open Foundation and Fine-Tuned Chat Models
Apr 17, 2026
Proximal Policy Optimization Algorithms
Apr 10, 2026
Reward Model
Apr 10, 2026
Training language models to follow instructions with human feedback (InstructGPT)