ML Wiki
Search
Search
Explorer
Tag: rl
3 items with this tag.
Apr 16, 2026
GRPO (Group Relative Policy Optimization)
concept
reinforcement-learning
training
alignment
rl
Apr 16, 2026
DeepSeek-R1: Incentivizing Reasoning via Reinforcement Learning
source
reasoning
reinforcement-learning
chain-of-thought
rl
grpo
Apr 16, 2026
GRPO: Group Relative Policy Optimization (DeepSeekMath)
source
reinforcement-learning
grpo
ppo
training
math-reasoning
rl