ML Wiki
Search
Search
Explorer
Tag: vision-language
10 items with this tag.
May 09, 2026
LLaVA-1.5: Improved Baselines with Visual Instruction Tuning
source
multimodal
vlm
instruction-tuning
vision-language
May 08, 2026
Perceiver Resampler
concept
multimodal
vision-language
architecture
May 08, 2026
Flamingo: A Visual Language Model for Few-Shot Learning
source
vision-language
multimodal
few-shot-learning
cross-attention
in-context-learning
Apr 30, 2026
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
source
vision-language
multimodal
pre-training
contrastive-learning
Apr 16, 2026
Multimodal Instruction Tuning (Visual Instruction Tuning / VLMs)
concept
multimodal
vlm
instruction-tuning
vision-language
Apr 16, 2026
LLaVA: Visual Instruction Tuning
source
multimodal
vlm
instruction-tuning
vision-language
Apr 05, 2026
Early Fusion
concept
vision-language
architecture
multimodal
Apr 05, 2026
Open-Vocabulary Segmentation
concept
vision-language
perception
segmentation
Apr 05, 2026
Visual Grounding
concept
vision-language
perception
grounding
Apr 05, 2026
Falcon Perception: Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation
source
vision-language
segmentation
grounding
ocr
distillation
benchmark