Jun Zhang

Rise Eco-Technologist

He graduated from Xiamen University with a master's degree in communication and information system, and has been working for Huawei since graduation. He has published several papers in MR and Neuro computing. As a core developer, he participated in the development of AI framework (Rise), and was responsible for the auto-differentiation of dynamic graphs and the combined dynamic and static modules. Currently, he is mainly involved in the development and optimisation of large model inference acceleration on Rise hardware, and is committed to further improving the performance of large model inference by optimising the inference framework, model algorithms, and arithmetic acceleration libraries.

Topic

Optimisation Practices for Large Model Reasoning Acceleration

Introduce the current common means of large model inference acceleration and their application scenarios and constraints, mainly frameworks, algorithms, and optimisation methods on operator acceleration. Combined with our actual development projects, we introduce the optimisation of MindIE-LLM in terms of framework and code implementation, and discuss the specific domain acceleration library ATB to provide optimisation ideas. Outline: Mainly divided into the following parts. 1, common hand breaks for large model inference acceleration. Introduces the application scenarios such as data and model parallel processing, framework implementation and arithmetic acceleration libraries, as well as constraints. 2、Based on MindIE-LLM introduce our optimisation practice on inference framework. 3、Introduce the ascendant acceleration library ATB in the field of Transformer, including the principle and the use of the introduction, and discuss. 4、Next optimisation direction. 5, Outlook.