ML Wiki

Tag: ppo

2 items with this tag.

  • Apr 17, 2026

    Proximal Policy Optimization Algorithms

    • source
    • reinforcement-learning
    • ppo
    • policy-gradient
    • rlhf
    • training
  • Apr 16, 2026

    GRPO: Group Relative Policy Optimization (DeepSeekMath)

    • source
    • reinforcement-learning
    • grpo
    • ppo
    • training
    • math-reasoning
    • rl