ML Wiki
Search
Search
Explorer
Tag: grpo
3 items with this tag.
Apr 16, 2026
DeepSeek-R1: Incentivizing Reasoning via Reinforcement Learning
source
reasoning
reinforcement-learning
chain-of-thought
rl
grpo
Apr 16, 2026
GRPO: Group Relative Policy Optimization (DeepSeekMath)
source
reinforcement-learning
grpo
ppo
training
math-reasoning
rl
Apr 10, 2026
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models (Metis / HDPO)
source
multimodal
reinforcement-learning
agents
tool-use
grpo
vision