Call for Papers
Multimodal human understanding and analysis is an emerging research area that cuts through several disciplines like Computer Vision, Natural Language Processing (NLP), Speech Processing, Human-Computer Interaction, and Multimedia. Several multimodal learning techniques have recently shown the benefit of combining multiple modalities in image-text, audio-visual and video representation learning and various downstream multimodal tasks. At the core, these methods focus on modelling the modalities and their complex interactions by using large amounts of data, different loss functions and deep neural network architectures. However, for many Web and Social media applications, there is the need to model the human, including the understanding of human behaviour and perception. For this, it becomes important to consider interdisciplinary approaches, including social sciences, semiotics and psychology.
The workshop will gather novel and unpublished works in the context of representation learning for cross-cultural, human-created multimodal data; from vision and language, cross-modal learning for NLP, IR methods for multimedia data, HCI, bias estimation, and other related topics. We especially look forward to the applicability of interdisciplinary ideas and theories, from semiotics over gestalt theory to multimodal computational research. The workshop will equally consider both novel scientific methods and techniques for analysis, extraction and enrichment of multimodal data as well as application perspectives, such as the innovative use of tools and methods for providing rich interaction with multimodal data. We will organize the workshop under two sub-tracks:
Track 1: Human-Centred Multimodal Understanding
The goal of this track is to attract researchers working in multimodal understanding (NLP, CV, Digital Humanities, and other related fields) topics with a focus on human-centred aspects. We will seek for original, both application-oriented and theoretical papers, and position papers that bridge both text and multimedia data. This track will cover novel research that targets (but not limited to) the following topics of interest:
- Multimodal modelling of human impressions in the context of the Web and social media
- Incorporating multi-disciplinary theories such as semiotics or Gestalt-theory into multimodal approaches and analyses
- Human-centred aspects in Vision and Language models
- Measuring and analysing cultural, social and multilingual biases in the context of the Web and social media
- Cross-modal and semantic relations in multimodal web data
- Multimodal human perception understanding
- Multimodal sentiment/emotion/sarcasm recognition
- Multimodal hate speech detection
- Multimodal misinformation detection
- Multimodal content understanding and analysis
- Multimodal rhetoric in online media
Track 2: Multimodal Understanding Through Impactful World Events
The goal of this track is to provide a dataset that facilitates the development of AI solutions for relevant and impactful research questions in order to bring together researchers working on similar topics such as multimedia and multimodal AI. For this purpose, we release news and social media data with both image and text related to events that attracted global impact, e.g., the 2024 United States presidential election, the 2025 German federal election, or the DeepSeek.AI R1 model & stock market crash. The datasets cover multimodal content in various languages published in different regions of the world. This allows the study of how the same event is portrayed in countries with different cultural, economic, and regional background. To foster research, we will provide various research questions along with the dataset which include, but are not limited to:
Geographical Proximity: How can news values in multimodal news such as the location of an event affect the human perception?
Multimodal Cultural Bias: How are world-wide events perceived across different cultures or languages?
Framing of Elites: How do multimodal framing techniques employed by news outlets differ in their portrayal of elite figures, e.g., politicians during major electoral events?
Sentiment across Cultures: How does the sentiment expressed in news articles and social media posts, throughout textual and visual vary across different countries covering the same event?
Societal Impact: How do world-wide events affect masses with regards to potential perceived consequences, and what is the role of each data modality on that perception?
The goal for participants is to develop novel research ideas based on the dataset but without requiring them to compete against each other. Each submitted work is expected to target some of the research questions while studying the unique aspect of the problem.
Important dates (Anywhere on Earth)
- Submission deadline: July 11th, 2025
- Paper notification: July 24th, 2025
- Camera ready: August 3rd, 2025
- Workshop date: October 27/28, 2025 in Dublin, Ireland
Submission Page
- Submission Page: link
Contact
If you have any problems or questions, please contact us via e-mail at: mailing list