免费领取大会全套演讲PPT    

点击领取

我要参会

Yang Li

Tencent (Hunyuan 3D) Researcher

Tencent (Hunyuan 3D) Researcher. Completed Ph.D. at the University of Tokyo under the supervision of Professor Tatsuya Harada. Research interests focus on the intersection of 3D computer vision and artificial intelligence, with particular emphasis on 3D object/scene generation, multimodal large models, and world models.

Topic

Towards 3D World Models: Tencent Hunyuan 3D-DiT Architectural Evolution and Automated Native Pipeline Engineering Practice

This talk will provide an in-depth deconstruction of Tencent’s Hunyuan 3D series models, tracing the technical pathway from single-asset generation to a full-fledged “3D World Model.” We will address the common algorithmic and engineering challenges in traditional 3D generation—such as unstructured mesh topology and the difficulty of integrating implicit representations into industrial pipelines—by presenting the industry-first 3D-DiT hierarchical sculpting architecture and its performance breakthroughs supporting 1536³ voxel resolution and 1.5 million polygon geometric fidelity. In addition, we will reveal, for the first time, the native quad-mesh reconstruction technology (PolyGen) based on large models, as well as the engineering implementation of an end-to-end 3D AI automated pipeline. Finally, we will demonstrate Hunyuan 3D World Model 1.0’s advancements in generalizable 360° immersive space generation, spatial navigation, and physical interaction, offering a systematic approach toward Spatial Intelligence. Outline 1. Breaking Through and Vision: From 3D Asset Generation to 3D World Models Algorithmic Bottlenecks: Existing 3D generation models (e.g., NeRF/3DGS-based or traditional mesh extraction) are limited in geometric continuity, high-frequency texture detail, and compatibility with industrial pipelines. Hunyuan’s Technical Evolution: The three-stage leap from “3D Object Base” → “3D Automated Pipeline” → “3D World Model.” 2. Foundational Architecture Breakthroughs: Hunyuan 3D-DiT High-Dimensional Representation 3D-DiT Architecture Analysis: The advantages of Transformers in 3D latent-space representation and network design. Hierarchical Sculpting Mechanism: How the algorithm balances computational cost with extreme precision, achieving 3.6 billion voxels and 1.5 million polygons at SOTA geometric fidelity. Quantitative and Qualitative Analysis: Core metric ablation studies, and the 30–60% performance improvement of version 3.0 over 2.5 in complex architectural and character topologies. 3. Crossing Industrial Boundaries: Native Quad Mesh and End-to-End Pipeline Intelligent Topology 1.5 (PolyGen Model): Breaking the industry “holy grail,” analyzing the first large model capable of native quad-mesh generation, ensuring edge flows meet professional artistic standards. Component Semantic Decomposition (Component Split 1.5): Lossless automatic part segmentation and local reconstruction based on high-dimensional semantic understanding. Engineering Implementation: A 10-minute end-to-end AI game pipeline overview, from algorithm inference to automatic export of full PBR material sets. 4. Early Stage Spatial Intelligence: Hunyuan 3D World Model 1.0 Panoramic Spatial Representation: Overcoming single-object limitations to achieve high-quality, large-scale 360° immersive space generation. Navigation and Physical Interaction: Exploring spatial consistency and physics simulation potential (e.g., collision detection) in scene-level mesh exports and integration with VR/game engines. 5. Summary and Frontier Outlook Next-Generation Challenges in Generative 3D: 4D dynamic rigging and cross-modal spatial intelligence (Spatial AI). Audience Takeaways Insights into Cutting-Edge Architecture: Gain a deep understanding of the 3D-DiT design philosophy and its underlying algorithms for high-resolution 3D geometry and texture representation. Master Engineering Solutions: Learn how to build end-to-end automated 3D AI industrial pipelines and understand the system engineering behind large-scale cost reduction and efficiency gains. Overcome Topology Challenges: Explore key techniques for native quad-mesh topology and automatic part generation, solving the “last-mile” compatibility problem of AI assets entering traditional rendering engines. Preview Spatial Intelligence: Be among the first to see research progress and emerging paradigms in full-scene generation, real-time interaction, and physics simulation enabled by 3D World Models.

© boolan.com 博览 版权所有

沪ICP备15014563号-6

沪公网安备31011502003949号