ML Wiki

Tag: grpo

3 items with this tag.

Apr 16, 2026
DeepSeek-R1: Incentivizing Reasoning via Reinforcement Learning
Apr 16, 2026
GRPO: Group Relative Policy Optimization (DeepSeekMath)
Apr 10, 2026
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models (Metis / HDPO)