ML Wiki

Tag: multimodal

6 items with this tag.

  • Apr 10, 2026

    From Pixels to Understanding — Vision-Language Models

    • learning-path
    • vision
    • multimodal
  • Apr 10, 2026

    Multimodal Embeddings

    • concept
    • multimodal
    • representation
  • Apr 10, 2026

    Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models (Metis / HDPO)

    • source
    • multimodal
    • reinforcement-learning
    • agents
    • tool-use
    • grpo
    • vision
  • Apr 10, 2026

    NUMINA: When Numbers Speak — Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models

    • source
    • vision
    • video-generation
    • diffusion
    • attention
    • counting
    • multimodal
  • Apr 09, 2026

    CLIP: Learning Transferable Visual Models From Natural Language Supervision

    • source
    • vision
    • multimodal
    • contrastive-learning
    • zero-shot
  • Apr 05, 2026

    Early Fusion

    • concept
    • vision-language
    • architecture
    • multimodal