Linggong Meng

Senior Machine Learning Specialist at DEVCO

Senior Machine Learning Specialist, Algorithm Engineering Direction at DEVO, mainly responsible for the R&D work related to DEVO's algorithm platform. He is responsible for the R&D of algorithmic platform in DEVO. He has been working in DEVO from 0 to 1 to build a universal large model training and inference platform. He used to work for Tencent, Ali and other big internet companies, and joined DEVO in 2022, focusing on big model related technology, including inference acceleration and application scenarios, and has published many high quality big model related articles in DEVO's technical public number, such as: using multi-Lora to save the cost of deploying big models, KubeAI big model inference acceleration practice, and DEVO's best practice of accessing the big model platform.

Topic

Large Model Reasoning Performance Improvement Practices

We have deployed an inference cluster dedicated to large models in batch in our production environment. In order to optimise the inference speed and reduce the cost of large models, we have made multi-faceted attempts to improve the performance of the inference engine by combining the latest technology in the industry. In this sharing, we will introduce some effective methods to improve the inference performance of large models to provide reference for more teams and developers seeking to optimise inference for large models. Outline: I Large Model Inference Performance Improvement-Business Scenarios and Challenges Faced II Large Model Inference Performance Improvement-Technical Direction III Scheduler Optimisation IV Attention Mechanism Optimisation V Other Optimisation Directions VI Feasible Reasoning Framework and Practical Tips VII Summary and Outlook