Guanghua Yu

Head of Large Model Compression Algorithms, Tencent Hunyuan

He is responsible for the implementation and innovation of compression algorithms for Tencent Hunyuan large models, including quantization, sparsification, and speculative sampling. With nearly 10 years of experience in artificial intelligence, he has extensive expertise in model compression and optimization, and has published over 10 patents and papers. He led the team in building the AngelSlim large-model compression toolkit from scratch, enabling both internal deployment within Hunyuan and open-source model compression applications. He also developed proprietary large-model compression algorithms that now serve more than 60% of the company’s large-model use cases, demonstrating deep understanding of both business and technology.

Topic

Bridging the Last Mile for Large Model Deployment: Practical Implementation of Large Model Compression

Large model acceleration and compression are essential for reducing costs and improving efficiency. This talk will share cutting-edge compression algorithms for large-scale industrial deployment, practical implementation approaches, and the open-source large model compression tool, AngelSlim. In terms of compression algorithms, the presentation will cover more effective and general methods from both business and technical perspectives, including quantization, speculative sampling, and sparsification. Insights will be shared across multiple modalities, covering text-to-text and multimodal understanding/generation tasks in different domains. Regarding large model compression tools, the talk will introduce JD’s self-developed AngelSlim open-source compression toolkit, covering aspects such as tool packaging, performance optimization, innovative algorithms, and one-click deployment practices, enabling low-cost and efficient large model deployment. Outline: Introduction to large model compression algorithms Practical implementation of compression algorithms Open-source large model compression tools Future prospects for large model acceleration and optimization

Bridging the Last Mile for Large Model Deployment: Practical Implementation of Large Model Compression

Large model acceleration and compression are essential for reducing costs and improving efficiency. This talk will share cutting-edge compression algorithms for large-scale industrial deployment, practical implementation approaches, and the open-source large model compression tool, AngelSlim. In terms of compression algorithms, the presentation will cover more effective and general methods from both business and technical perspectives, including quantization, speculative sampling, and sparsification. Insights will be shared across multiple modalities, covering text-to-text and multimodal understanding/generation tasks in different domains. Regarding large model compression tools, the talk will introduce JD’s self-developed AngelSlim open-source compression toolkit, covering aspects such as tool packaging, performance optimization, innovative algorithms, and one-click deployment practices, enabling low-cost and efficient large model deployment. Outline: Introduction to large model compression algorithms Practical implementation of compression algorithms Open-source large model compression tools Future prospects for large model acceleration and optimization

© boolan.com 博览 版权所有

沪ICP备15014563号-6

沪公网安备31011502003949号