免费领取大会全套演讲PPT    

报名领取

我要参会

Xin Pan

CTO of SHARGE

Specializes in Diffusion models and MLLM R&D and applications. 10+ years in AI engineering/algorithms: contributed to TensorFlow/TPU at Google Brain (CV/NLP/speech research), led the foundational overhaul of Baidu PaddlePaddle, built Tencent’s Wuliang Recommendation System (serving 100M+ DAU), and spearheaded ByteDance’s AIGC/vision foundation model platform (powering Douyin/TikTok/CapCut).

Topic

Multimodal techniques and applications

1.Historical Review CV, NLP, Speech from weak to strong, from multi-stage to end-to-end, from fragmentation to convergence 2 Introduction to Diffusion and Multimodal-LLM 2.1 Evolution of Diffusion 2.2 Evolution of MLLM 2.3 Relationship between MLLM and Diffusion 3. Technical Challenges of Multimodal in Products 3.1 Limitations and analyses of current MLLM: Reasoning, Charts & Multilingual, Hallucination 3.2 Some directions for improvement. 3.21 Train multimodal from scratch 3.22 Better and Modular Encoder 3.23 Vision replace Text 4 Applying Multimodal to Documents and Social Products 4.1 Multimodal RAG, Multimodal-conditioned generation 4.2 MLLM and Diffusion Co-design 5 Outlook 5.1 Multimodal Agent 5.2 Co-evolution of Human and AI

© boolan.com 博览 版权所有

沪ICP备15014563号-6

沪公网安备31011502003949号