免费领取大会全套演讲PPT    

点击领取

我要参会

Shangming Cai

Core Developer of the SGLang Community

Researcher and Technical Expert at the Alibaba Cloud Apsara Lab, and a Ph.D. in Computer Science from the Department of Computer Science and Technology at Tsinghua University. His research focuses on high-performance inference systems, large language models, and distributed machine learning training. He is a core contributor and maintainer of features such as PD separation and pipeline parallelism in the SGLang community. He is also a core member and maintainer of the Mooncake community.

Topic

SGLang: Panorama of High-Performance Inference Today and Its Future Path

SGLang, as an open-source high-performance LLM/VLM inference engine, provides day-0 support for open-source models such as DeepSeek, Qwen, and Kimi, promoting architectural and technical advancements in inference systems. It has been adopted by many top domestic and international companies as a production inference deployment engine, supporting over 300,000 GPUs worldwide. This talk will briefly introduce SGLang’s important technical achievements in 2025 and recent developments, including: PD-separated large-scale deployment, hierarchical KVCache, reinforcement learning integration, speculative decoding ecosystem support, PP parallel acceleration for ultra-long contexts, Encoder-Prefill-Decode separation, Mini-SGLang, and more. It will also share SGLang’s Q1 2026 roadmap. Outline: Attendees can learn about SGLang’s core features, get updates on the latest developments and future roadmap, which helps users run LLM/VLM inference on the SGLang framework with maximum performance and reduced costs, and helps related developers understand progress and join development.

© boolan.com 博览 版权所有

沪ICP备15014563号-6

沪公网安备31011502003949号