Fenglong Xie
Head of Audio Technology,rednote
D. degree from Harbin Institute of Technology-Microsoft Research Asia, was awarded the title of “Microsoft Scholar” in 2015, and was the single champion of Blizzard Challenge 2023 International Speech Synthesis Competition, and is now responsible for the research, development and implementation of speech recognition and synthesis, voice interaction, and music technology in the whole scenario of rednote. R&D and landing. FireRed, a comprehensive solution for speech/music technology based on big model. He has published dozens of papers in conferences and journals in the field of speech, such as ICASSP, INTERSPEECH, IEEE TASLP, and SPEECH COMMUNICATION.
Topic
FireRed-Integrated Practice in Speech/Music Technology Based on Large Models
Introduction: 2024 speech/music technology based on the big model program has been generated like a spring. This report covers rednote technical team developed a series of speech/music big model technical details and application landing such as FireRedASR: open source Chinese speech recognition new sota, integrated extreme accuracy and efficient reasoning need to build FireRed-LLM and FireRed-AED, respectively, compared to the previous sota Seed-ASR, Chinese text error rate is reduced by 8.4%. FireRedTTS2.0: a new open source Chinese speech synthesis sota that supports zero-shot speech cloning and human-like natural speech generation with paralinguistic imagery and emotional control. FireRedChat: an ultra-low-latency and human-like real-time conversation system solution for large model speech. FireRedmusic: rednote style music generation program. Outline: FireRedASR: sota's large model Chinese speech recognition system and application landing FireRedTTS2.0: sota's large model Chinese and English speech synthesis system and application landing FireRedChat: a low-latency and ultra-natural voice interaction solution. FireRedMusic: a rednote-style music generation program.