ML Wiki

Tag: vision-language

10 items with this tag.

  • May 09, 2026

    LLaVA-1.5: Improved Baselines with Visual Instruction Tuning

    • source
    • multimodal
    • vlm
    • instruction-tuning
    • vision-language
  • May 08, 2026

    Perceiver Resampler

    • concept
    • multimodal
    • vision-language
    • architecture
  • May 08, 2026

    Flamingo: A Visual Language Model for Few-Shot Learning

    • source
    • vision-language
    • multimodal
    • few-shot-learning
    • cross-attention
    • in-context-learning
  • Apr 30, 2026

    BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

    • source
    • vision-language
    • multimodal
    • pre-training
    • contrastive-learning
  • Apr 16, 2026

    Multimodal Instruction Tuning (Visual Instruction Tuning / VLMs)

    • concept
    • multimodal
    • vlm
    • instruction-tuning
    • vision-language
  • Apr 16, 2026

    LLaVA: Visual Instruction Tuning

    • source
    • multimodal
    • vlm
    • instruction-tuning
    • vision-language
  • Apr 05, 2026

    Early Fusion

    • concept
    • vision-language
    • architecture
    • multimodal
  • Apr 05, 2026

    Open-Vocabulary Segmentation

    • concept
    • vision-language
    • perception
    • segmentation
  • Apr 05, 2026

    Visual Grounding

    • concept
    • vision-language
    • perception
    • grounding
  • Apr 05, 2026

    Falcon Perception: Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation

    • source
    • vision-language
    • segmentation
    • grounding
    • ocr
    • distillation
    • benchmark