免费领取大会全套演讲PPT    

点击领取

我要参会

Huang Haoyang

Director of the JD Exploration Research Institute and Head of the Multimodal Foundation Model R&D Team at JD.com

Hao-Yang Huang is Director of the Exploration Research Institute at JD.com and Head of the Multimodal Foundation Model Team. He has published more than 40 papers in top AI journals and conferences. Previously at Microsoft Research Asia, he led the development of multilingual and multimodal foundation models used in Microsoft Bing and Microsoft Translator, including Unicoder covering 100 languages and M3P, the world’s first multilingual multimodal pre-training model. His team won first place in the WMT21 Large-Scale Multilingual Machine Translation competition. In 2024, he led the development and open-sourcing of the 30B StepVideo series of video generation models (Step-Video-T2V and Step-Video-TI2V).

Topic

JoyAI-Image-Edit: Break the 2D Dimension and Reshape the Spatial Editing Paradigm

In recent years, generative AI has developed rapidly, evolving from early text generation to multimodal content generation including images, videos, and audio. As model scale, data scale, and training methodologies continue to advance, multimodal generative models are becoming a key research direction in the AI field. This talk will review the evolution of multimodal generative models from an industry perspective and focus on the key technical challenges in video generation and multimodal generation, including data construction, model training, and capability evaluation. Drawing on experience from related projects, the talk will also summarize practical methodologies for engineering deployment of multimodal generative models and analyze the application value and future trends of generative AI in areas such as digital content production. Outline: The Evolution of Generative AI From Text Generation to Multimodal Generation Key Technical Challenges in Multimodal Generative Models Data Scale, Training Methods, and Model Capabilities Application Scenarios of Multimodal Generative Models Digital Content Production and Intelligent Interaction

© boolan.com 博览 版权所有

沪ICP备15014563号-6

沪公网安备31011502003949号