ML Wiki

Tag: vision

8 items with this tag.

  • Apr 11, 2026

    Patch Embeddings

    • concept
    • vision
    • architecture
    • transformers
  • Apr 11, 2026

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    • source
    • vision
    • transformers
    • architecture
    • transfer-learning
  • Apr 10, 2026

    From Pixels to Understanding — Vision-Language Models

    • learning-path
    • vision
    • multimodal
  • Apr 10, 2026

    Video Generation (Diffusion-Based)

    • concept
    • vision
    • generative
  • Apr 10, 2026

    Vision Transformer (ViT)

    • concept
    • vision
    • architecture
  • Apr 10, 2026

    Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models (Metis / HDPO)

    • source
    • multimodal
    • reinforcement-learning
    • agents
    • tool-use
    • grpo
    • vision
  • Apr 10, 2026

    NUMINA: When Numbers Speak — Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models

    • source
    • vision
    • video-generation
    • diffusion
    • attention
    • counting
    • multimodal
  • Apr 09, 2026

    CLIP: Learning Transferable Visual Models From Natural Language Supervision

    • source
    • vision
    • multimodal
    • contrastive-learning
    • zero-shot