Bin He | 2026 Singularity Intelligent technology Summit

免费领取大会全套演讲PPT

点击领取

我要参会

Bin He

Head of Model Performance Optimization at OmnInfer, Huawei R&D Engineer

Head of Model Performance Optimization at OmnInfer and committer of the MTP SIG. He graduated from the University of Science and Technology of China and the University of Chinese Academy of Sciences. After joining Huawei, he has worked for over a decade in computer networking and AI infrastructure, gaining extensive engineering experience in large model inference optimization. He has been deeply involved in performance optimization for multiple open-source models as well as the Pangu large model on the Ascend platform, supporting high-performance inference services and RL rollout.

Topic

Extreme Performance Optimization with Omni-Infer

Abstract： Omni-Infer is a powerful inference acceleration toolkit tailored for Ascend hardware platforms. This talk presents practical explorations of achieving extreme performance optimization for both large language models (LLMs) and multimodal models, focusing on high throughput and low latency. It will cover key techniques such as operator fusion, multi-stream parallelism, advanced scheduling strategies, and speculative execution, along with real-world optimization case studies. Outline： Background Case Studies: High Throughput & Low Latency Optimization Future Directions

Boolan is a leading IT Education & Consulting company in China. Our core competence is our experts team around the world and their cutting edge technology experience accumulated through decades. Adhering to the tenet of "Global Experts, Global Wisdom", we are dedicated to providing our customers In-house Training,Technical Conference, Software Consulting, Expert Lecture, Seminar, Talent Evaluation and Certification and other services by gathering the world's top IT technology experts. www.boolan.com

沪ICP备15014563号-6