| Tutorial
Multimodal Large Language Models


Artificial intelligence (AI) is currently undergoing a paradigm shift, from single-text understanding to integrating multimodal information such as vision and auditory information. Multimodal large models, at the core of this trend, are rapidly becoming a key driver of the development of general artificial intelligence (AI). These models not only understand and generate content across modalities but also demonstrate enormous potential in fields such as scientific research, healthcare, education, and human-computer interaction. However, their rapidly evolving technical landscape encompasses complex architectural designs, training paradigms, and ethical challenges, creating a field of knowledge that urgently needs to be clarified and disseminated.

To this end, we are organizing this "Multimodal Large Models" workshop. This workshop aims to systematically analyze this cutting-edge technology, delving into its core principles, key challenges, and recent developments. Through this event, we hope to not only strengthen participants' theoretical foundations and inspire innovative ideas, but also to jointly examine the ethical and societal impacts of this technology, thereby effectively promoting knowledge dissemination, responsible innovation, and future development in the field of multimodal AI.


Schedul

Oct. 31th 13:30 - 15:30


Organizer

1760607259926320.png 

WeiShi Zheng

Sun Yat-sen University,  Professor

Biography:  
Wei-Shi Zheng, a Cheung Kong Professor and a Newton Senior Scholar of the Royal Society of the United Kingdom, is currently the Director of the Ministry of Education's Key Laboratory of Machine Intelligence and Advanced Computing. He has long conducted research on the theory and methods of collaborative and interactive analysis, addressing visual computing problems in human modeling and robot behavior. He has published over 150 papers in CCF-A, CAS Rank 1, and Nature journals. He serves on the editorial boards of top international AI journals such as IEEE T-PAMI and Artificial Intelligence Journal. He has presided over five national key projects and talent projects, as well as the Guangdong Provincial Natural Science Foundation's Outstanding Young Team (Principal Investigator) project. He has won the First Prize of the Natural Science Award of the Chinese Society of Image and Graphics, the First Prize of the Guangdong Natural Science Award, and the Second Prize of the National Teaching Achievement Award.

 

1760607290830246.jpg 

Qing Zhang

Sun Yat-sen University, Associate Profesor

Biography:  
Qing Zhang is an Associate Professor and doctoral supervisor at the School of Computer Science at Sun Yat-sen University. His research focuses on visual content generation and editing, image-based 3D modeling and rendering, and embodied intelligence. He has published over 70 papers, including over 50 CCF- A/IEEE Transactions papers, his Citations on Google Scholar have exceeded 4,800. He is ranked among the top 2% of scientists worldwide and received the Second Prize in Natural Sciences at the 2019 Hubei Provincial Award for Research and the Outstanding Young Scientist Award at the 2022 World Artificial Intelligence Conference. He has led numerous research projects, including those funded by the National Natural Science Foundation of China (NSFC), Youth Projects, and etc.


Presenters

1760173962785008.png 

Mang Ye

Wuhan University, Professor

Biography:  
Mang Ye is a Professor at the School of Computer Science, Wuhan University, where he also serves as the Chair of the Department of Intelligent Science. He is a recipient of the National High-level Young Talents Program. His research focuses on multimodal computing, federated learning, and medical artificial intelligence. He has published over 100 papers as the first or corresponding author in top-tier CCF-A conferences and journals, with more than 14,000 citations on Google Scholar. He serves on the editorial boards of IEEE TIP and IEEE TIFS, and as an area chair for prestigious conferences such as CVPR, ICLR, NeurIPS, ICML, and AAAI. He has led over ten research projects, including the NSFC-Hong Kong Joint Research Fund and the Ministry of Science and Technology Key R&D Program. His honors include being named a Highly Cited Researcher by Clarivate, one of Stanford's  "Top 2% Scientists Worldwide", and a Baidu AI Chinese Young Scholar.

Speech Title Parameter Efficient Fine-tuning and Security of Multimodal Large Language Models

AbstractMultimodal large language models, with their powerful cross-modal understanding and generation capabilities, have emerged as a core research direction in artificial intelligence. However, their application in vertical domains faces two major challenges: how to balance the retention of general knowledge with the injection of domain-specific expertise during fine-tuning, and how to address potential risks related to data security and model security. This tutorial will present recent progress from our team on preserving general capabilities and enhancing model security in the fine-tuning of multimodal large language models.

 

1760174024540269.jpg 

Qibin Hou

Nankai University, Associate Professor

Biography: 
Qibin Hou primarily focuses his research on multimodal foundation models and visual perception. He has presided over projects such as the Young Scientists Fund (Category B) and General Program of the National Natural Science Foundation of China. To date, he has published over 50 papers in top-tier CCF Rank A international journals and conferences related to computer vision, including IEEE TPAMI, CVPR, ICCV, and NeurIPS, with over 23,000 citations on Google Scholar. He has received awards including the First Prize of the Natural Science Award from the Ministry of Education and the Wu Wenjun Artificial Intelligence Science and Technology Award (Natural Science, Second Prize).

Speech Title Reinforcement Learning-Based Fine-Tuning of Multimodal Large Language Models

AbstractTo enhance the precision and reliability of multimodal large models in following complex instructions, this talk explores a reinforcement learning fine-tuning strategy based on Group Relative Preference Optimization (GRPO). This approach innovatively leverages grouped preference comparisons to effectively overcome reward design challenges in multimodal coordination, guiding the model to generate high-quality outputs that better align with human values. The talk will systematically introduce the method's theoretical framework, implementation pathway, and its strong performance on tasks such as video understanding.

 


Registration

你知道你的Internet Explorer是过时了吗?

为了得到我们网站最好的体验效果,我们建议您升级到最新版本的Internet Explorer或选择另一个web浏览器.一个列表最流行的web浏览器在下面可以找到.