MUWS 2025 - The 4th International Workshop on Multimodal Human Understanding for the Web and Social Media
The 4th International Workshop on Multimodal Human Understanding for the Web and Social Media (MUWS), co-located with ACM Multimedia (ACM MM 2025).
Program
Date: October 28, 2025 in Dublin, Ireland
Timezone: Irish Standard Time
Current time:
Room: Hyatt, Dean Swift 2
09:00 – 9:30 - Chairs’ Welcome
09:30 – 10:30 - Keynote Talk by Beate Schirrmacher “Factuality, Storytelling and the Quest for Authenticity in Journalism”
Abstract: In discussions of mis- and disinformation, facts are often pitted against narratives. However, even reliable and relevant information relies on narration. For instance, journalism does not simply report the facts; it constructs multimodal, transmedial, and serial narratives to explain and engage audiences regarding recent events. How does the increased focus on emotionally charged storytelling impact on the the facts and information conveyed? This presentation examines the interplay between factual narration and storytelling in journalism, exploring how facts and stories inform one another. Drawing on intermedial theory, this kind of analysis explores truthfulness in communication an interplay of perceived traces of events and communicational coherence (Elleström 2018). The analysis reveals how a set of shared basic features shapes journalistic news narration, but it also matters how these features are assembled, as the specific use of narrative and rhetorical patterns can either support or undermine factuality. For instance, in the fake features of the former German reporter Claas Relotius, narrative coherence and claims of authenticity replace specificity. Furthermore, examples of audiovisual journalism highlight how the increased focus on letting images tell the stories and claims to authenticity transform journalistic narration. Rather than opposing facts and fiction, this approach offers a more nuanced understanding of the narrative challenges of news in a rapidly changing and diverse media landscape. So, how does the increased focus on emotionally charged storytelling impact the facts and information conveyed?
- 10:30 – 11:00 - Coffee Break
Session 1: Multimodal Understanding Through Impactful World Events
11:00 – 11:20 - “Analyzing Emotional Discourse in Multilingual Social Media: A Case Study of the Russo-Ukrainian Conflict” by Ahmed Taiye Mohammed et al.
11:20 – 11:35 - “Multilingual Evaluation of Image-Text Retrieval in Vision-Language Models: A Metric-Based Perspective” by Bodhisatta Maiti
11:35 – 12:30 - Keynote Talk by Ralph Ewerth “The Challenge of Multimodality in Analyzing Social Media News and Videos”
Abstract: Harmful content on the Web is frequently conveyed in a multimodal manner, i.e., by combining different “forms of expression” (modalities). This yields a major challenge for computational analysis methods, as the modalities can refer to each other and modify the overall meaning of the multimodal information, sometimes even changing the meaning completely (also called meaning multiplication in the literature). However, the complex interplay of information from different modalities is typically not explicitly modelled in state-of-the-art approaches for multimodal information analysis. In this talk, we present our approach to overcome this drawback by introducing a computational model for the complex interplay of modalities (here text and image) and several approaches for multimodal news analysis. Finally, we outline challenges in the quite different domain of educational videos, in particular with regard to video-based learning.

- 12:30 - 13:30 - Lunch break
Session 2: Human-Centred Multimodal Understanding
13:30 – 13:50 - “Analyzing the Visual Variety of Adjectives based on Clustering of Visual Features” by Yui Tanaka et al.
13:50 – 14:10 - “Revealing Label Noise in Multimodal Hateful Video Classification” by Shuonan Yang et al.
14:10 – 14:25 - “Video Analysis of Confusion and Understanding in Dyadic Explanations” by Jonas Paletschek et al.
14:25 – 14:45 - “Landmark Guided Visual Feature Extractor for Visual Speech Recognition with Limited Resources” by Lei Yang et al.
14:45 - Closing ceremony
Organizing Committee
- Sherzod Hakimov
- Marc A. Kastner
- David Semedo
- Eric Müller-Budack
- Takahiro Komamizu