- Frontiers of Machine Learning
- Multimodal Large Language Model and Generative AI
- Smart Earth Observation and Remote Sensing Analysis: From Perception to Interpretation
- 3D Imaging and Display
- Forum on Multimodal Sensing for Spatial Intelligence
- Brain-Computer Interface: Frontiers of Imaging, Graphics and Interaction
- Foundation Models for Embodied Intelligence
- Workshop on Machine Intelligence Frontiers: Advances in Multimodal Perception and Representation Learning
- Human-centered Visual Generation and Understanding
- Spatial Intelligence and World Model for the Autonomous Driving and Robotics
- Seminar on the Growth of Women Scientists
- Video and Image Security in the Era of Large Models Forum
In the context of the rapid development of large model technology, the ability to generate, edit and disseminate images and videos has achieved an unprecedented leap. At the same time, security risks such as content forgery, privacy leakage, and deepcamouflage are becoming increasingly prominent, posing severe challenges to social trust, information dissemination, and individual rights and interests. This forum focuses on the theme of "Video and Image Security in the Era of Large Models", focusing on the key issues and technological frontiers in the current intersection of artificial intelligence and video security, and aims to explore how to identify, defend against and regulate new visual security threats brought by large models. The forum will cover the directions of big data security, AI security (interpretability), video image ethics security, image and video content understanding security, etc., and invite experts and scholars from artificial intelligence, security research and other fields to exchange and discuss. By building an interdisciplinary dialogue platform, we will promote consensus and cooperation between academia and industry on visual safety, jointly address the security challenges brought about by technological development, and ensure the healthy application of image and video content in a trusted environment.
Schedul
Nov. 2nd 15:10-17:10
Organizer
Kaiqi Huang
Institute of Automation, Chinese Academy of Sciences, Full Professor
Biography:
Dr Kaiqi Huang serves as Director of the Center for Intelligent Systems and Engineering (CRISE)at the Institute of Automation, Chinese Academy of Sciences, Fellow of the International Association for Pattern Recognition (IAPR), and Distinguished Professor at the University of Chinese Academy of Sciences. His work on image understanding and cognitive decision-making in AI has led to over 200 publications, multiple Best Paper Awards, and 100+ patents applied in key national sectors. He developed the GOT-10k video tracking and TuringAI human–machine confrontation platforms, and has received major honors including the National Science and Technology Progress Award and recognition as a National Leading Talent. He also serves as Associate Editor for several IEEE Transactions journals and has chaired multiple major international conferences.
Shengjin Wang
Tsinghua University, Full Professor
Biography:
Shengjin Wang, Full Professor at Tsinghua University. Director of the Research Center of Intelligent Computing and Autonomous Systems, in the Department of Electronics at Tsinghua University. His team belongs to the Embodied Intelligence Innovation Group of Beijing National Research Center for Information Technology. The main research fields include computer vision, machine learning, intelligent video analysis and semantic description, pedestrian re-identification and human posture computing, embodied intelligence and multimodal collaborative robotics. One paper has been cited over 5,000 times on Google Scholar. As one of the drafters, he has formulated 10 industry and national standards including those of the Public Security Industry Standards of the People's Republic of China and National Standards. He also serves as the Deputy Director of the Video Surveillance and Security Special Committee of the Chinese Graphics and Image Society, a member of the Application Sub-Technical Committee of Human Biometric Feature Recognition of the National Security Standardization Committee, Deputy Director of the Defense Big Data Branch of the Association for Automation, Deputy Director of the National Engineering Research Center for Hazardous Materials Detection Technology, Member of the Defense Science Expert Group of the Joint Staff Command, Chairperson of the Academic Activities Committee of the IEERegion10 Beijing Branch. He has received support from multiple national longitudinal projects including 973, 863, National Natural Science Foundation of China, Ministry of Education Doctoral Program Foundation, National Science and Technology Support Program, and National Key Research and Development Program Project, and has achieved outstanding results. He has published over 150 papers in important international journals of IEEE and academic conferences, including top journals such as Nature, IEEE Trans, and three major top conferences such as CVPR/ICCV/ECCV, as well as other top conferences such as ICCV 2023 and AAAI 2018 Oral. In March 2023, the collaborative robotics team of the laboratory won the first prize in the Recognition Track of the 2023 Intel Indoor Robot Learning Global Challenge. He has applied for 15 invention patents. He has received one National Science and Technology Progress Award in 2008, one Beijing Science and Technology Award in 2006, one Wenjun Wu Artificial Intelligence Science and Nature Award in 2019, one Public Security Science and Technology Award of the Ministry of Public Security in 2019, one Beijing Science and Technology Award for Technological Invention in 2021, and one Tianjin Science and Technology Progress Award in 2023.
Presenters
Kaiqi Huang
Institute of Automation, Chinese Academy of Sciences, Full Professor
Biography:
Dr Kaiqi Huang serves as Director of the Center for Intelligent Systems and Engineering (CRISE)at the Institute of Automation, Chinese Academy of Sciences, Fellow of the International Association for Pattern Recognition (IAPR), and Distinguished Professor at the University of Chinese Academy of Sciences. His work on image understanding and cognitive decision-making in AI has led to over 200 publications, multiple Best Paper Awards, and 100+ patents applied in key national sectors. He developed the GOT-10k video tracking and TuringAI human–machine confrontation platforms, and has received major honors including the National Science and Technology Progress Award and recognition as a National Leading Talent. He also serves as Associate Editor for several IEEE Transactions journals and has chaired multiple major international conferences.
Speech Title: Visual intelligence thinking in the era of large models
Abstract:Visual intelligence promotes breakthroughs in autonomous driving, public safety and other fields by simulating human visual perception and cognitive capabilities, and is a key technical pillar of an intelligent society. With the advent of the era of large models, visual intelligence has developed rapidly. The report will start from visual Turing, introduce the concepts, current situation and challenges of different stages of visual intelligence development, and think about the development path of visual intelligence in the era of large models.
Nong Sang
Huazhong University of Science and Technology, Professor
Biography:
Nong Sang received his BE degree in computer science and engineering in 1990, MS degree in pattern recognition and intelligent control in 1993, and PhD degree in pattern recognition and intelligent systems in 2000. He is currently a professor at the School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China. His research interests include low-quality image enhancement, object detection and recognition, object tracking, image/video semantic segmentation, action detection and recognition.
Speech Title:From Video Anomaly Detection to Video Anomaly Understanding
Abstract:In fields like intelligent surveillance, the sheer amount of video data demands automated technologies for analyzing abnormal behavior. Yet, current technologies are commonly hindered by two key problems: a poor trade-off between the cost of annotation and model performance, and a failure to deeply understand or explain anomalies. Consequently, the central challenge is to build detection models that are both efficient and accurate, and can also offer a profound understanding of abnormal behavior. This report focuses on these two core challenges in video anomaly analysis—moving from "efficient detection" to "deep understanding"—and introduces our solutions for them. To address the high cost of annotation, we present GlanceVAD, a method based on a "glance-style supervision" paradigm. This approach dramatically reduces annotation costs while striking an outstanding balance between detection accuracy and annotation efficiency. To bridge the gap from simply detecting anomalies to truly understanding them, we developed Holmes-VAU. This method utilizes Multimodal Large Language Models (MLLMs) to adaptively focus on critical anomalous clips, allowing for a multi-level, in-depth understanding and explanation of the events.
Zhijian Ou
Department of Electronic Engineering, Tsinghua University, Professor
Biography:
Zhijian Ou is currently a Professor with the Department of Electronic Engineering, Tsinghua University, and a Co-founder of TasiTech. His research interests include speech and language processing and machine intelligence. He is an Senior Area Editor of IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING and SIGNAL PROCESSING LETTERS, an Editorial Board Member of COMPUTER SPEECH AND LANGUAGE, a Member of IEEE Speech and Language Processing Technical Committee, a Distinguished Member of China Computer Federation (CCF), a Standing Committee Member of CCF Speech Conversation and Auditory Technical Committee, a Committee Member of Speech Acoustics and Hearing Branch of the Acoustical Society of China (ASC), and was the General Chair of SLT 2021, EMNLP 2022 SereTOD Workshop. He has led multiple research projects funded by the National Natural Science Foundation of China, the Ministry of Science and Technology, and the Ministry of Education, and has received three provincial or ministerial-level science and technology awards as well as multiple Best Paper Awards at domestic and international academic conferences. His recent work focuses on probabilistic energy models and their applications, as well as reliable and efficient human–machine dialogue technologies with large-models.
Speech Title:Are Large Models the Watt Steam Engine of the New Industrial Revolution?
Abstract:Artificial intelligence is an important driving force of the Fourth Industrial Revolution. Large-model AI technologies, represented by ChatGPT, have attracted great attention and are developing rapidly. The hallmark of the First Industrial Revolution was the invention and application of Watt’s steam engine, which leads to an interesting question for today: Are large models the Watt steam engine of the new industrial revolution? This talk will, with reference to the development of large-model technologies represented by ChatGPT, reflect on the challenges they face in supporting the reliability and efficiency required for large-scale applications, and share some of our recent progress. Specifically, the talk will cover:1) The success of large models and principled semi-supervised learning; 2) Efficient multilingual speech recognition large-models, realizing efficient collaboration between the machine’s ear and brain; 3) Reliable and efficient large-model human–machine dialogue technologies for professional scenarios.
Zhaofeng He
Beijing University of Posts and Telecommunications, Professor
Biography:
Zhaofeng He is a Professor and Ph.D. Supervisor at the School of Artificial Intelligence, Beijing University of Posts and Telecommunications. He focuses on technological innovation and industrial application in fields including biometric recognition, large model security and governance, and intelligent game decision-making. He presided over more than 20 national and provincial-level scientific research projects, such as those under the National Key R&D Program, the National Natural Science Foundation of China, Beijing Central Government-guided Local Special Projects, and Young Top-notch Innovation Teams.He has over 100 publications in journals and conferences including IEEE Trans. PAMI, CVPR, with more than 90 invention patents applied for and over 40 national and industrial standards formulated. He was selected into various talent programs, including the Young Top-notch Individual of the Excellent Talents Training Program of the Organization Department of the Beijing Municipal Party Committee, the Young Top-notch Team (as team leader), and Beijing's "High-level Innovation and Entrepreneurship Talent Support Program". He also received awards such as the Second Prize of Technological Invention of the Chinese Society of Image and Graphics, the First Prize of Innovation Achievement of the China Association for Promotion of Industry-University-Research Cooperation, the Excellent Doctoral Dissertation Award of the China Computer Federation, the Excellent Doctoral Dissertation Award of the Chinese Academy of Sciences, and the Best Paper Nomination Award of the International Biometrics Conference.
Speech Title:Content Security and Ethical Risk Detection of Generative Artificial Intelligence
Abstract:In recent years, generative artificial intelligence has made breakthrough progress. While providing convenience for human production and life, it has also brought severe security and ethical challenges. Illegal and irregular issues such as identity forgery and inappropriate remarks based on AIGC (Artificial Intelligence Generated Content) have occurred frequently. The Cyberspace Administration of China and six other departments jointly issued the Interim Measures for the Administration of Generative Artificial Intelligence Services, which clearly stipulates that the content generated by generative artificial intelligence should be true and accurate and embody the core values of socialism. In response to the ethical and security challenges faced by generative artificial intelligence in applications, this report will focus on introducing the work of the research team in areas such as ethical risk detection and governance of large models, and multimodal risk detection, aiming to provide technical support for the safe and controllable development of AIGC technology.