ML Wiki

Tag: reinforcement-learning

10 items with this tag.

  • Apr 17, 2026

    Policy Gradient

    • concept
    • reinforcement-learning
    • optimization
    • training
  • Apr 17, 2026

    Reinforcement Learning

    • concept
    • reinforcement-learning
    • training
    • optimization
  • Apr 17, 2026

    Proximal Policy Optimization Algorithms

    • source
    • reinforcement-learning
    • ppo
    • policy-gradient
    • rlhf
    • training
  • Apr 16, 2026

    GRPO (Group Relative Policy Optimization)

    • concept
    • reinforcement-learning
    • training
    • alignment
    • rl
  • Apr 16, 2026

    RL for Reasoning (Test-Time Compute Scaling)

    • concept
    • reinforcement-learning
    • reasoning
    • chain-of-thought
    • scaling
  • Apr 16, 2026

    DeepSeek-R1: Incentivizing Reasoning via Reinforcement Learning

    • source
    • reasoning
    • reinforcement-learning
    • chain-of-thought
    • rl
    • grpo
  • Apr 16, 2026

    GRPO: Group Relative Policy Optimization (DeepSeekMath)

    • source
    • reinforcement-learning
    • grpo
    • ppo
    • training
    • math-reasoning
    • rl
  • Apr 10, 2026

    PPO (Proximal Policy Optimization)

    • concept
    • reinforcement-learning
    • training
    • alignment
  • Apr 10, 2026

    Tool Use in Language Agents

    • concept
    • agents
    • reinforcement-learning
  • Apr 10, 2026

    Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models (Metis / HDPO)

    • source
    • multimodal
    • reinforcement-learning
    • agents
    • tool-use
    • grpo
    • vision