Huang Haoyang | 2026 Singularity Intelligent technology Summit

免费领取大会全套演讲PPT

点击领取

我要参会

Huang Haoyang

Director of the JD Exploration Research Institute and Head of the Multimodal Foundation Model R&D Team at JD.com

Hao-Yang Huang is Director of the Exploration Research Institute at JD.com and Head of the Multimodal Foundation Model Team. He has published more than 40 papers in top AI journals and conferences. Previously at Microsoft Research Asia, he led the development of multilingual and multimodal foundation models used in Microsoft Bing and Microsoft Translator, including Unicoder covering 100 languages and M3P, the world’s first multilingual multimodal pre-training model. His team won first place in the WMT21 Large-Scale Multilingual Machine Translation competition. In 2024, he led the development and open-sourcing of the 30B StepVideo series of video generation models (Step-Video-T2V and Step-Video-TI2V).

Topic

JoyAI-Image-Edit: Break the 2D Dimension and Reshape the Spatial Editing Paradigm

In recent years, generative AI has developed rapidly, evolving from early text generation to multimodal content generation including images, videos, and audio. As model scale, data scale, and training methodologies continue to advance, multimodal generative models are becoming a key research direction in the AI field. This talk will review the evolution of multimodal generative models from an industry perspective and focus on the key technical challenges in video generation and multimodal generation, including data construction, model training, and capability evaluation. Drawing on experience from related projects, the talk will also summarize practical methodologies for engineering deployment of multimodal generative models and analyze the application value and future trends of generative AI in areas such as digital content production. Outline: The Evolution of Generative AI From Text Generation to Multimodal Generation Key Technical Challenges in Multimodal Generative Models Data Scale, Training Methods, and Model Capabilities Application Scenarios of Multimodal Generative Models Digital Content Production and Intelligent Interaction

Boolan is a leading IT Education & Consulting company in China. Our core competence is our experts team around the world and their cutting edge technology experience accumulated through decades. Adhering to the tenet of "Global Experts, Global Wisdom", we are dedicated to providing our customers In-house Training,Technical Conference, Software Consulting, Expert Lecture, Seminar, Talent Evaluation and Certification and other services by gathering the world's top IT technology experts. www.boolan.com

沪ICP备15014563号-6