ML Wiki
Search
Search
Explorer
Tag: transformers
11 items with this tag.
May 02, 2026
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
source
architecture
positional-encoding
attention
transformers
Apr 20, 2026
GQA: Grouped-Query Attention — How Modern LLMs Got 5x Faster Without Losing Quality
source
attention
inference-efficiency
kv-cache
transformers
Apr 13, 2026
Bidirectional Context
concept
transformers
attention
nlp
Apr 13, 2026
Fine-tuning
concept
transformers
training
adaptation
Apr 13, 2026
Masked Language Model
concept
transformers
pre-training
nlp
Apr 13, 2026
Pre-training
concept
transformers
training
nlp
Apr 13, 2026
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
source
transformers
attention
pre-training
nlp
Apr 11, 2026
Classification Token (CLS Token)
concept
architecture
transformers
Apr 11, 2026
Patch Embeddings
concept
vision
architecture
transformers
Apr 11, 2026
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
source
vision
transformers
architecture
transfer-learning
Apr 06, 2026
RoPE: Enhanced Transformer with Rotary Position Embedding
source
transformers
attention
positional-encoding