ML Wiki
Search
Search
Explorer
Tag: alignment
8 items with this tag.
Apr 10, 2026
Alignment (AI)
concept
alignment
training
Apr 10, 2026
PPO (Proximal Policy Optimization)
concept
reinforcement-learning
training
alignment
Apr 10, 2026
Reward Model
concept
alignment
training
rlhf
Apr 10, 2026
Training language models to follow instructions with human feedback (InstructGPT)
source
alignment
rlhf
llm
fine-tuning
safety
Apr 04, 2026
DPO (Direct Preference Optimization)
concept
alignment
training
Apr 04, 2026
RLHF (Reinforcement Learning from Human Feedback)
concept
alignment
training
Apr 04, 2026
SFT (Supervised Fine-Tuning)
concept
training
alignment
Apr 04, 2026
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
source
alignment
preference-learning
reward-modeling
policy-optimization
human-feedback