ML Wiki

Tag: distributed-training

6 items with this tag.

  • May 09, 2026

    Data Parallel

    • concept
    • distributed-training
    • parallelism
  • May 09, 2026

    Model Parallel

    • concept
    • distributed-training
    • parallelism
    • memory
  • May 09, 2026

    Tensor Parallel

    • concept
    • distributed-training
    • model-parallel
    • megatron
  • May 09, 2026

    Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

    • source
    • distributed-training
    • model-parallel
    • tensor-parallel
    • pre-training
    • systems
  • May 09, 2026

    PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

    • source
    • distributed-training
    • fsdp
    • data-parallel
    • pytorch
    • memory-efficiency
    • systems
  • May 09, 2026

    ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

    • source
    • distributed-training
    • memory-efficiency
    • data-parallel
    • model-parallel
    • deepspeed