Haosheng Zou
Senior Expert of 360 Brain Algorithm
Dr. Haosheng Zou is currently a senior expert of 360 Smart Brain algorithm, leading the open source projects Light-R1 and 360-LLaMA-Factory. he graduated his PhD from Prof. Zhu Jun's group in TSAIL at Tsinghua University, and his undergraduate degree is from the Department of Electronics at Tsinghua University. He was a reinforcement learning researcher at Mihalyu and Fourth Paradigm before Big Model, and is the author of the TF version of tianshou.
Topic
Open Source Inference Modeling for Course Learning with GRPO Data Insights and Training Strategies
360 Smart Brain open-sourced the Light-R1 multi-size series of inference models as well as training data and code in early March, the first time in the open source to achieve the domain review from zero beyond DeepSeek-R1-Distill-32B, the first time to achieve a 14B model in the long inference SFT after the GRPO reinforcement learning significant enhancement of the whole series of models were unveiled in a short video by Zhou Hongyi. The report will introduce the data insights and training strategies on the methods of course learning SFT, DPO and GRPO behind Light-R1, as well as the comparison with the mainstream inference models in the industry, zero-RL, and other related work in terms of training resources, methods, and other aspects. Although Light-R1 has trained the model for long inference using only mathematical data, it has shown generalizability and effectiveness on non-mathematical tasks as well. With the continuous development of training and inference techniques, long inference models will become more popular in the future, and Light-R1 is providing an important reference for quickly training a domain-specific inference model at low cost.