Junlin Zhang
Head of New Technology R&D at Sina Weibo
He is a director of the Chinese Information Society of China and has a PhD from the Institute of Software of the Chinese Academy of Sciences. Before that, he worked as a senior technologist in Alibaba, responsible for the new technology team. The author of "This is the Search Engine: Core Technology Explained" and "Big Data Journal: Architecture and Algorithm".
Topic
Methods and issues in native multimodal macromodelling: the case of Gemini
Introduction:Multimodal big model is the main battlefield of the current international cutting-edge big model competition, the typical example is OpenAI's GPT-4V and Google's Gemini battle. Currently, there are two mainstream approaches to multimodal macromodels: spliced multimodal vs. native multimodal, and most of the publicly available multimodal macromodels take the route of splicing technology, the main reason is that compared with the native multimodal model, the spliced multimodal macromodels have a lower production cost.Gemini is a typical native multimodal macromodel, and the present sharing will analyse the possible production methods of Gemini. Gemini is a typical native multimodal large model, this sharing will deeply analyse the possible production methods of Gemini, through the in-depth analysis of the Gemini technical report released by Google, and combined with the mainstream multimodal large model technology, introduce the main points of the method of the native multimodal large model, as well as the urgently needed to solve the technical problems faced.