Fan Bao

CTO of ShengShu Technology

Ph.D. from the Department of Computer Science and Technology at Tsinghua University, selected for the 2023 MIT TR35 (China) list, and recipient of multiple honors including the National Scholarship and the Zhong Shimo Scholarship, the highest honor in Tsinghua's Department of Computer Science. Has published over ten papers at top-tier conferences such as ICML, NeurIPS, ICLR, and CVPR. Among these, the paper Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models was awarded the ICLR 2022 Outstanding Paper Award, a world-class academic accolade. This marked the first paper at this top-tier machine learning conference to win an award while being solely authored by a mainland China-based institution.

Topic

ShengShu Technology's Exploration and Practice in Large Multimodal Models

ShengShu Technology has been committed to the research and application of multimodal large models, and we have some of our own cutting-edge judgments and deep thoughts on the industry's chokepoints and ultimate forms. In the field of multimodal large model, how to unify the input and output of multimodal is a key issue that has been pending in the industry. For the unification of multimodal representation, through technical research and exploration and practice, we have observed some potential solutions that can unify video, audio, text and other modalities. In addition, we have very much thought about the ultimate form of multimodal big models. If the current video generation plays more of a “rendering” role, then the ultimate form of multimodal large model will play the role of intelligent brain in the future, which is similar to the o1 in the field of video generation, with strong reasoning ability, can understand the user's intention and intelligently guide their work and life. Outline: 1. ShengShu Technology's practice and technology exploration in the field of multimodal big model. 2、Some thoughts on the unified representation of multimodal big model field. 3、The ultimate form of multimodal big model

© boolan.com 博览 版权所有

沪ICP备15014563号-6

沪公网安备31011502003949号