Publications

MuSeD: A Multimodal Spanish Dataset for Sexism Detection in Social Media Videos, 2025.
Laura De Grazia, Pol Pastells, Mauro Vázquez Chas, Desmond Elliott, Danae Sánchez Villegas , Mireia Farrús, Mariona Taulé.
Paper

In this study, (1) we introduce MuSeD, a Multimodal Spanish dataset for Sexism Detection consisting of ≈11 hours of videos extracted from TikTok and BitChute; (2) we propose an annotation framework for analyzing the contribution of textual and multimodal labels in the classification of sexist and non-sexist content; and (3) we evaluate a range of LLMs and multimodal LLMs on the task of sexism detection.

Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users, In ACL 2025.
Antonia Karamolegkou, Malvina Nikandrou, Georgios Pantazopoulos, Danae Sánchez Villegas, Phillip Rust, Ruchira Dhar, Daniel Hershcovich, Anders Søgaard.
Paper

This paper explores the effectiveness of Multimodal Large Language models (MLLMs) as assistive technologies for visually impaired individuals.

ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models, 2025.
Danae Sánchez Villegas, Ingo Ziegler, Desmond Elliott..
Paper Code Data

We introduce ImageChain, a framework that enhances multimodal LLMs with sequential image reasoning.

Political advertising on Facebook: Campaign strategies deployed by major political parties in the UK. ECPR 2024 Panel Digital campaigning: Empirical research and normative implications.
Junyan Zhu, Andrew Barclay, Danae Sánchez Villegas.
Panel Code

In this paper, we aim to advance our understanding of the role online political advertising plays in campaign activities by addressing three questions: 1) What are the goals of political parties’ online advertising activity? 2) Which policy issues do party accounts address most often in their paid advertising? 3) Does negative campaigning persist over time?

Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary tasks. In EACL 2024 Findings.
Danae Sánchez Villegas, Daniel Preotiuc-Pietro, Nikolaos Aletras.
Paper Code

We use two auxiliary losses, Image-Text Contrastive (ITC) and Image-Text Matching (ITM), jointly with the main task when fine-tuning any pre-trained multimodal model for social media posts classification.
We combine these objectives with five multimodal models, demonstrating consistent improvements across four popular social media datasets.

A Multimodal Analysis of Influencer Content on Twitter 🏆 Best Area Chair Award – Society & NLP. In AACL 2023.
Danae Sánchez Villegas, Catalina Goanta, Nikolaos Aletras.
Paper Data Slides

Our research explores the challenges in automatically detecting regulatory compliance breaches in influencer advertising.
We introduce a new dataset, and experiments to improve the detection of commercial influencer content.

Sheffield’s Submission to the AmericasNLP Shared Task on Machine Translation into Indigenous Languages. 🥇 Best Submission. In Workshop on Natural Language Processing for Indigenous Languages of the Americas 2023.
Edward Gow-Smith, Danae Sánchez Villegas.
Paper Code

We describe our submission to the AmericasNLP 2023 Shared Task on Machine Translation into Indigenous Languages.
Our approach consists of extending, training, and ensembling different variations of NLLB-200.
We achieve the highest average chrF of all the submissions.

Combining Humor and Sarcasm for Improving Political Parody Detection. In NAACL 2022.
Xiao Ao, Danae Sánchez Villegas, Daniel Preotiuc-Pietro, Nikolaos Aletras.
Paper Code Tech At Bloomberg

We propose a method that combines parallel encoders to capture parody, humor, and sarcasm-specific representations from input sequences, which outperforms previous state-of-the-art models for parody detection.

Point-of-Interest Type Prediction using Text and Images. In EMNLP 2021.
Danae Sánchez Villegas and Nikolaos Aletras.
Paper Data Poster

We propose a model for POI type prediction combining text and image using a modality gate to control the amount of information needed from the text and image, and a cross-attention mechanism to learn cross-modal interactions.

Analyzing Online Political Advertisements. In ACL Findings 2021.
Danae Sánchez Villegas, Saeid Mokaram, Nikolaos Aletras.
Paper Data

We present work on inferring ideology and sponsor type from political ads in the US.
We make available two new datasets for political ad analysis, evaluate multimodal models and provide an in-depth analysis of the limitations of our models.

Point-of-Interest Type Inference from Social Media Text. In AACL 2020.
Danae Sánchez Villegas, Daniel Preotiuc-Pietro, Nikolaos Aletras.
Paper Data Tech At Bloomberg

We introduce a dataset of tweets mapped into Foursquare POIs (locations), evaluate several text classifier models & provide temporal analysis.

Analyzing Political Parody in Social Media. In ACL 2020.
Antonis Maronikolakis, Danae Sánchez Villegas, Daniel Preotiuc-Pietro, Nikolaos Aletras.
Paper Data Full Dataset Request Slides Blog Post

We present a first study of parody using methods from computational linguistics and machine learning.
We introduce a freely available large-scale data set containing a total of 131,666 English tweets from 184 real and corresponding parody accounts, and evaluate a range of neural models achieving high predictive accuracy.

Beyond Words: Analyzing Social Media with Text and Images. PhD thesis, University of Sheffield.
Danae Sánchez Villegas.
eThesis

My Ph.D. thesis is particularly focused on introducing challenging tasks as well as novel methods to gain a better understanding of multimodal content and its underlying dynamics in the context of social media. .