ML Wiki
Search
Search
Explorer
Tag: reinforcement-learning
3 items with this tag.
Apr 10, 2026
PPO (Proximal Policy Optimization)
concept
reinforcement-learning
training
alignment
Apr 10, 2026
Tool Use in Language Agents
concept
agents
reinforcement-learning
Apr 10, 2026
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models (Metis / HDPO)
source
multimodal
reinforcement-learning
agents
tool-use
grpo
vision