Chengqiang Lu
Head of AI Search Generation Algorithms at Xiaohongshu
Chengqiang Lu is currently Head of AI Search Generation Algorithms at Xiaohongshu. He received his Master’s degree from the University of Science and Technology of China. His main research areas include large model pre-training/post-training, Agents, RAG, and AI4Sci. He has previously worked at Tencent QLab, Alibaba Qwen Team, and Xiaohongshu Search, focusing on algorithm research and real-world applications. He has published dozens of papers in top conferences and journals such as NeurIPS, KDD, AAAI, and ACL, with around 6,000 citations on Google Scholar.
Topic
Aligning AI Search Agents at Xiaohongshu with Adaptive Reinforcement Learning
AI search agents must dynamically balance multiple objectives such as factual accuracy, safety, information richness, and user experience. This introduces new challenges for reinforcement learning–based alignment. This talk presents a complete RL alignment approach for AI search agents driven by adaptive curriculum learning. We propose an adaptive curriculum learning RL framework that dynamically monitors the learning curves of different reward dimensions and the marginal contribution of training data. During training, the system automatically adjusts optimization priorities and data weighting, enabling multi-objective optimization tailored to the learning progress. Based on this methodology, we built the Xiaohongshu SearchLLM Agent and designed a hierarchical multi-dimensional reward system. By combining rule-based checks, LLM-based evaluation, and gated aggregation, and optimizing with GRPO together with adaptive curriculum learning, the system achieves significant improvements on both offline and online metrics.