ML Wiki

Tag: continuous-batching

1 item with this tag.

  • May 09, 2026

    Orca: A Distributed Serving System for Transformer-Based Generative Models

    • source
    • inference-serving
    • continuous-batching
    • iteration-level-scheduling
    • selective-batching
    • systems