Siyuan Jiang

aiXcoder code big model algorithm expert

Responsible for the whole process of aiXcoder big model development, including data acquisition and cleaning, big model construction and training, model inference optimisation and service, model evaluation, etc. Concerned about how to get the base code big model more in line with the actual development scenario of software engineering through large-scale distributed training; concerned about building a human-aligned training method that can match the software development process, tools, and behaviours; concerned about how to do domain-based incremental training on private code and minimize the disaster forgetting and other issues.

Topic

Improving the Quality of Code Generation: Practical Experiences with the Big Model of Code

Topic Focus In order to improve the quality of code generation for large models, we need algorithmic and engineering innovations from a variety of perspectives ranging from pre-training data processing, to instruction fine-tuning positive feedback, to numerous post-processing strategies. This talk begins by presenting work on combining deep learning with tools in the field of software engineering, and details how various software engineering tools can be combined to improve the quality of code generation. We will describe the importance of pre-training data processing, including steps such as data filtering, syntactic analysis, and static analysis to ensure high quality of training data. The use of engineering tools is also important for generating results, such as syntactic analysis and type inference, and how to optimise the quality of code generation through an iterative process. Finally, this presentation highlights strategies for adapting to industry-specific requirements through domain-specific training in code generation, which is important for the real-world use of the model.