免费领取大会全套演讲PPT    

点击领取

我要参会

Bin He

Head of Model Performance Optimization at OmnInfer, Huawei R&D Engineer

Head of Model Performance Optimization at OmnInfer and committer of the MTP SIG. He graduated from the University of Science and Technology of China and the University of Chinese Academy of Sciences. After joining Huawei, he has worked for over a decade in computer networking and AI infrastructure, gaining extensive engineering experience in large model inference optimization. He has been deeply involved in performance optimization for multiple open-source models as well as the Pangu large model on the Ascend platform, supporting high-performance inference services and RL rollout.

Topic

Extreme Performance Optimization with Omni-Infer

Abstract: Omni-Infer is a powerful inference acceleration toolkit tailored for Ascend hardware platforms. This talk presents practical explorations of achieving extreme performance optimization for both large language models (LLMs) and multimodal models, focusing on high throughput and low latency. It will cover key techniques such as operator fusion, multi-stream parallelism, advanced scheduling strategies, and speculative execution, along with real-world optimization case studies. Outline: Background Case Studies: High Throughput & Low Latency Optimization Future Directions

© boolan.com 博览 版权所有

沪ICP备15014563号-6

沪公网安备31011502003949号