ML Wiki
Search
Search
Explorer
Tag: multimodal
6 items with this tag.
Apr 10, 2026
From Pixels to Understanding — Vision-Language Models
learning-path
vision
multimodal
Apr 10, 2026
Multimodal Embeddings
concept
multimodal
representation
Apr 10, 2026
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models (Metis / HDPO)
source
multimodal
reinforcement-learning
agents
tool-use
grpo
vision
Apr 10, 2026
NUMINA: When Numbers Speak — Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models
source
vision
video-generation
diffusion
attention
counting
multimodal
Apr 09, 2026
CLIP: Learning Transferable Visual Models From Natural Language Supervision
source
vision
multimodal
contrastive-learning
zero-shot
Apr 05, 2026
Early Fusion
concept
vision-language
architecture
multimodal