ML Wiki
Search
Search
Explorer
Tag: vision
8 items with this tag.
Apr 11, 2026
Patch Embeddings
concept
vision
architecture
transformers
Apr 11, 2026
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
source
vision
transformers
architecture
transfer-learning
Apr 10, 2026
From Pixels to Understanding — Vision-Language Models
learning-path
vision
multimodal
Apr 10, 2026
Video Generation (Diffusion-Based)
concept
vision
generative
Apr 10, 2026
Vision Transformer (ViT)
concept
vision
architecture
Apr 10, 2026
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models (Metis / HDPO)
source
multimodal
reinforcement-learning
agents
tool-use
grpo
vision
Apr 10, 2026
NUMINA: When Numbers Speak — Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models
source
vision
video-generation
diffusion
attention
counting
multimodal
Apr 09, 2026
CLIP: Learning Transferable Visual Models From Natural Language Supervision
source
vision
multimodal
contrastive-learning
zero-shot