ML Wiki
Search
Search
Explorer
Tag: reinforcement-learning
10 items with this tag.
Apr 17, 2026
Policy Gradient
concept
reinforcement-learning
optimization
training
Apr 17, 2026
Reinforcement Learning
concept
reinforcement-learning
training
optimization
Apr 17, 2026
Proximal Policy Optimization Algorithms
source
reinforcement-learning
ppo
policy-gradient
rlhf
training
Apr 16, 2026
GRPO (Group Relative Policy Optimization)
concept
reinforcement-learning
training
alignment
rl
Apr 16, 2026
RL for Reasoning (Test-Time Compute Scaling)
concept
reinforcement-learning
reasoning
chain-of-thought
scaling
Apr 16, 2026
DeepSeek-R1: Incentivizing Reasoning via Reinforcement Learning
source
reasoning
reinforcement-learning
chain-of-thought
rl
grpo
Apr 16, 2026
GRPO: Group Relative Policy Optimization (DeepSeekMath)
source
reinforcement-learning
grpo
ppo
training
math-reasoning
rl
Apr 10, 2026
PPO (Proximal Policy Optimization)
concept
reinforcement-learning
training
alignment
Apr 10, 2026
Tool Use in Language Agents
concept
agents
reinforcement-learning
Apr 10, 2026
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models (Metis / HDPO)
source
multimodal
reinforcement-learning
agents
tool-use
grpo
vision