SGLang v0.5.10+
Primary EngineRadixAttention shares prefix KV across requests (40–60% TTFT reduction for agents and fixed system prompts). 29% higher throughput than vLLM on MoE architectures in independent benchmarks.
- • Native OpenAI-compatible HTTP — no proxy needed
- • Built-in EAGLE v2 speculative decoding
- • Modal recommends for online workloads