ML Wiki

Tag: alignment

26 items with this tag.

May 05, 2026
Self-Rewarding Language Models
Apr 29, 2026
ORPO: Monolithic Preference Optimization without Reference Model
Apr 28, 2026
Does DPO scale reliably past 70B?
Apr 28, 2026
What's the right alignment stack post-RLHF?
Apr 25, 2026
Learning to Summarize from Human Feedback
Apr 22, 2026
KTO: Model Alignment as Prospect Theoretic Optimization
Apr 21, 2026
GPT-4 Technical Report
Apr 17, 2026
AI Feedback (RLAIF)
Apr 17, 2026
Constitutional AI (CAI)
Apr 17, 2026
Data Quality
Apr 17, 2026
Harmlessness (AI alignment)
Apr 17, 2026
Instruction Following
Apr 17, 2026
Self-Critique (AI)
Apr 17, 2026
Constitutional AI: Teaching Models to Self-Correct
Apr 17, 2026
KTO: Model Alignment as Prospect Theoretic Optimization
Apr 17, 2026
LIMA: Less Is More for Alignment
Apr 17, 2026
Llama 2: Open Foundation and Fine-Tuned Chat Models
Apr 16, 2026
GRPO (Group Relative Policy Optimization)
Apr 10, 2026
Alignment (AI)
Apr 10, 2026
PPO (Proximal Policy Optimization)
Apr 10, 2026
Reward Model
Apr 10, 2026
Training language models to follow instructions with human feedback (InstructGPT)
Apr 04, 2026
DPO (Direct Preference Optimization)
Apr 04, 2026
RLHF (Reinforcement Learning from Human Feedback)
Apr 04, 2026
SFT (Supervised Fine-Tuning)
Apr 04, 2026
Direct Preference Optimization: Your Language Model is Secretly a Reward Model