Tang Feihu
Senior R&D Engineer and Developer Relations Lead at Moonshot AI
Tang Feihu, Senior R&D Engineer and Developer Relations Lead at Moonshot AI. Former Google engineer, ACM/ICPC Asia Regional Gold Medalist, Microsoft “Programming Beauty” Challenge Champion, and champion of the first Wanxiang Lab Token Economy Design Competition.
Topic
Linear Attention: Past, Present, and Future
The quadratic complexity of standard self-attention has become a fundamental bottleneck for long-context AI agents and edge-side model deployment. Linear attention, which reduces attention computation from O(n²) to O(n), has emerged as one of the most significant algorithmic breakthroughs in efficient sequence modeling. This talk will present the full technical evolution of linear attention. We will start from theoretical foundations (Performer, Linear Transformer, RNN reformulations), analyze state-of-the-art architectures that have gained mainstream adoption (Mamba, RetNet, GLA, and hardware-aware designs), and look ahead to key research directions shaping 2026: hybrid attention strategies for intelligent agent workflows, hardware-software co-design for edge deployment, and the convergence of linear attention with state-space models. Drawing on production experience from training large-scale foundation models at Kimi, this lecture bridges algorithmic innovation with infrastructure realities. We will examine real-world trade-offs between memory efficiency, training stability, and downstream task performance—critical considerations often overlooked in academic papers. **Outline:** I. Quadratic Crisis (5 min) II. Past: Theoretical Foundations (8 min) III. Present: Modern Landscape (12 min) IV. Future: Towards Intelligent Agent AI (10 min) V. Open Source & Community (3 min) VI. Q&A (5 min) **Audience Takeaways:** * Gain a full-stack understanding of linear attention, from mathematical principles (kernel methods, RNN duality, and state-space formulations) to engineering deployment. * Deeply understand the real-world trade-offs in training stability, memory efficiency, and hardware utilization for mainstream architectures like Performer, Mamba, and RetNet. * Obtain a practical decision framework for selecting optimal attention strategies in edge deployment and long-context AI agent scenarios. * Gain insight into the 2026 frontier trends in hybrid architectures, adaptive sparsity, and hardware-software co-optimization. * Acquire end-to-end capabilities to transform theoretical breakthroughs into production-grade, high-efficiency model infrastructure.