ML Wiki

Tag: transformers

11 items with this tag.

  • May 02, 2026

    Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

    • source
    • architecture
    • positional-encoding
    • attention
    • transformers
  • Apr 20, 2026

    GQA: Grouped-Query Attention — How Modern LLMs Got 5x Faster Without Losing Quality

    • source
    • attention
    • inference-efficiency
    • kv-cache
    • transformers
  • Apr 13, 2026

    Bidirectional Context

    • concept
    • transformers
    • attention
    • nlp
  • Apr 13, 2026

    Fine-tuning

    • concept
    • transformers
    • training
    • adaptation
  • Apr 13, 2026

    Masked Language Model

    • concept
    • transformers
    • pre-training
    • nlp
  • Apr 13, 2026

    Pre-training

    • concept
    • transformers
    • training
    • nlp
  • Apr 13, 2026

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    • source
    • transformers
    • attention
    • pre-training
    • nlp
  • Apr 11, 2026

    Classification Token (CLS Token)

    • concept
    • architecture
    • transformers
  • Apr 11, 2026

    Patch Embeddings

    • concept
    • vision
    • architecture
    • transformers
  • Apr 11, 2026

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    • source
    • vision
    • transformers
    • architecture
    • transfer-learning
  • Apr 06, 2026

    RoPE: Enhanced Transformer with Rotary Position Embedding

    • source
    • transformers
    • attention
    • positional-encoding