免费领取大会全套演讲PPT    

报名领取

我要参会

Xin Pan

Co-Founder of 01.AI

Co-Founder of 01.AI . Mainly engaged in the development and application of Diffusion, MLLM model. ten years working experience in AI engineering and algorithm, participated in TensorFlow, TPU development and CV, NLP, Speech deep learning algorithm research in Google Brain. Responsible for the reconstruction of Baidu Flying Paddle from 0 to 1. Developed Tencent Infinity Recommender System to support content recommendation for hundreds of millions of DAU users in Tencent. Responsible for byte AIGC and visual big model AI platform, supporting products such as Jitterbit, Tiktok, and Cutting Image.

Topic

Multimodal techniques and applications

1.Historical Review CV, NLP, Speech from weak to strong, from multi-stage to end-to-end, from fragmentation to convergence 2 Introduction to Diffusion and Multimodal-LLM 2.1 Evolution of Diffusion 2.2 Evolution of MLLM 2.3 Relationship between MLLM and Diffusion 3. Technical Challenges of Multimodal in Products 3.1 Limitations and analyses of current MLLM: Reasoning, Charts & Multilingual, Hallucination 3.2 Some directions for improvement. 3.21 Train multimodal from scratch 3.22 Better and Modular Encoder 3.23 Vision replace Text 4 Applying Multimodal to Documents and Social Products 4.1 Multimodal RAG, Multimodal-conditioned generation 4.2 MLLM and Diffusion Co-design 5 Outlook 5.1 Multimodal Agent 5.2 Co-evolution of Human and AI

© boolan.com 博览 版权所有

沪ICP备15014563号-6

沪公网安备31011502003949号