Tang Feihu | 2026 Singularity Intelligent technology Summit

免费领取大会全套演讲PPT

点击领取

我要参会

Tang Feihu

Senior R&D Engineer and Developer Relations Lead at Moonshot AI

Tang Feihu, Senior R&D Engineer and Developer Relations Lead at Moonshot AI. Former Google engineer, ACM/ICPC Asia Regional Gold Medalist, Microsoft “Programming Beauty” Challenge Champion, and champion of the first Wanxiang Lab Token Economy Design Competition.

Topic

Linear Attention: Past, Present, and Future

The quadratic complexity of standard self-attention has become a fundamental bottleneck for long-context AI agents and edge-side model deployment. Linear attention, which reduces attention computation from O(n²) to O(n), has emerged as one of the most significant algorithmic breakthroughs in efficient sequence modeling. This talk will present the full technical evolution of linear attention. We will start from theoretical foundations (Performer, Linear Transformer, RNN reformulations), analyze state-of-the-art architectures that have gained mainstream adoption (Mamba, RetNet, GLA, and hardware-aware designs), and look ahead to key research directions shaping 2026: hybrid attention strategies for intelligent agent workflows, hardware-software co-design for edge deployment, and the convergence of linear attention with state-space models. Drawing on production experience from training large-scale foundation models at Kimi, this lecture bridges algorithmic innovation with infrastructure realities. We will examine real-world trade-offs between memory efficiency, training stability, and downstream task performance—critical considerations often overlooked in academic papers. **Outline:** I. Quadratic Crisis (5 min) II. Past: Theoretical Foundations (8 min) III. Present: Modern Landscape (12 min) IV. Future: Towards Intelligent Agent AI (10 min) V. Open Source & Community (3 min) VI. Q&A (5 min) **Audience Takeaways:** * Gain a full-stack understanding of linear attention, from mathematical principles (kernel methods, RNN duality, and state-space formulations) to engineering deployment. * Deeply understand the real-world trade-offs in training stability, memory efficiency, and hardware utilization for mainstream architectures like Performer, Mamba, and RetNet. * Obtain a practical decision framework for selecting optimal attention strategies in edge deployment and long-context AI agent scenarios. * Gain insight into the 2026 frontier trends in hybrid architectures, adaptive sparsity, and hardware-software co-optimization. * Acquire end-to-end capabilities to transform theoretical breakthroughs into production-grade, high-efficiency model infrastructure.

Boolan is a leading IT Education & Consulting company in China. Our core competence is our experts team around the world and their cutting edge technology experience accumulated through decades. Adhering to the tenet of "Global Experts, Global Wisdom", we are dedicated to providing our customers In-house Training,Technical Conference, Software Consulting, Expert Lecture, Seminar, Talent Evaluation and Certification and other services by gathering the world's top IT technology experts. www.boolan.com

沪ICP备15014563号-6