Xin Pan | 2024 Machine Learning Summit

免费领取大会全套演讲PPT

报名领取

我要参会

Xin Pan

CTO of SHARGE

Specializes in Diffusion models and MLLM R&D and applications. 10+ years in AI engineering/algorithms: contributed to TensorFlow/TPU at Google Brain (CV/NLP/speech research), led the foundational overhaul of Baidu PaddlePaddle, built Tencent’s Wuliang Recommendation System (serving 100M+ DAU), and spearheaded ByteDance’s AIGC/vision foundation model platform (powering Douyin/TikTok/CapCut).

Topic

Multimodal techniques and applications

1.Historical Review CV, NLP, Speech from weak to strong, from multi-stage to end-to-end, from fragmentation to convergence 2 Introduction to Diffusion and Multimodal-LLM 2.1 Evolution of Diffusion 2.2 Evolution of MLLM 2.3 Relationship between MLLM and Diffusion 3. Technical Challenges of Multimodal in Products 3.1 Limitations and analyses of current MLLM: Reasoning, Charts & Multilingual, Hallucination 3.2 Some directions for improvement. 3.21 Train multimodal from scratch 3.22 Better and Modular Encoder 3.23 Vision replace Text 4 Applying Multimodal to Documents and Social Products 4.1 Multimodal RAG, Multimodal-conditioned generation 4.2 MLLM and Diffusion Co-design 5 Outlook 5.1 Multimodal Agent 5.2 Co-evolution of Human and AI

Boolan is a leading IT Education & Consulting company in China. Our core competence is our experts team around the world and their cutting edge technology experience accumulated through decades. Adhering to the tenet of "Global Experts, Global Wisdom", we are dedicated to providing our customers In-house Training,Technical Conference, Software Consulting, Expert Lecture, Seminar, Talent Evaluation and Certification and other services by gathering the world's top IT technology experts. www.boolan.com

沪ICP备15014563号-6