Jibing Xie

Software Development Engineer, eBay Machine Learning Platform

He graduated from Shanghai Jiaotong University with a master's degree, specializing in AI. He used to work for Tencent, mainly focusing on the development of customized AI inference engine and engineering landing, and now works for eBay machine learning platform department, focusing on the construction of eBay cloud native AI inference platform. As the main leader, he led the development of LLM zero code deployment solution based on triton server, LLM auto scale solution based on k8s, and LLM benchmark automation tool. He is always committed to promoting the practical application of AI technology, and strives to create greater value for enterprises through technological innovation.

Topic

Engineering LLM for the eBay Cloud Native Model Reasoning Platform

LLM is characterized by large model size, dependence on GPU resources, and fast iteration speed compared with traditional models. In order to realize fast, fully self-service deployment of LLM and efficient reasoning of LLM, we developed a unified reasoning solution based on triton server + vllm. In order to meet the requirements of LLM for multi-card deployment, we optimize the management and scheduling capabilities of GPU resources. In order to optimize the usage of GPU resources, we developed LLM auto scale solution based on the auto scale capability provided by K8S, including model file download acceleration and docker image cache. In addition, we have built LLM auto benchmark tool to facilitate users to evaluate the performance of model inference as well as calculate the required GPU resources. Outline: 1. business background and platform challenges 2. inference service and inference engine of LLM 3. auto scale of LLM service 4. LLM self-service benchmarking tool 5. Future and outlook

© boolan.com 博览 版权所有

沪ICP备15014563号-6

沪公网安备31011502003949号