Supervised Fine-Tuning of Foundation Models for Robust Empathy Detection in Text and Video Interactions

Md Rakibul Hasan
Email: rakibul.hasan@curtin.edu.au
Website: https://hasan-rakibul.github.io

This PhD research advances our understanding of and deep learning algorithms for detecting empathy in various human interactions. I developed novel algorithms on both textual and computer vision modalities. Apart from empathy detection itself, I addressed three specific challenges – (1) robust learning with noisy labels, (2) cross-subject generalisation and privacy-aware method for video-based detection, and (3) uncertainty quantification – that are also applicable to general data analytics tasks. My proposed methods have yielded improved accuracy in empathy detection on five public benchmarks.

Publications are listed below. Some papers are highlighted.

UPLME: Uncertainty-Aware Probabilistic Language Modelling for Robust Empathy Regression
MR Hasan, MZ Hossain, A Krishna, S Rahman and T Gedeon
[Code] [arXiv] [PDF] [Demo]

This paper propose UPLME, an uncertainty-aware probabilistic language modelling framework to capture label noise in empathy regression. UPLME is trained using Bayesian concepts with variational model ensembling. We further introduce two novel loss components: one penalises degenerate Uncertainty Quantification (UQ), and another enforces the similarity between the input pairs on which we predict empathy.

Are You Really Empathic? Evidence from Trait, State and Speaker-Perceived Empathy, and Physiological Signals
MR Hasan, MZ Hossain, A Krishna, S Rahman and T Gedeon
[arXiv] [PDF]

This paper analyses listener's trait empathy, state empathy and their physiological signals, and speaker-perceived empathy from an experiment we conducted at Curtin University. In the experiment, speakers described a personal incident and one or more listeners responded naturally, as in everyday conversation. Afterwards, speakers reported perceived empathy, and listeners reported their trait and state empathy.

TFMPathy: Tabular Foundation Model for Privacy-Aware, Generalisable Empathy Detection from Videos
MR Hasan, MZ Hossain, A Krishna, S Rahman and T Gedeon
[arXiv] [PDF]

This paper investigates the potential of tabular foundation models (TFMs) for empathy detection from video-derived tabular data. On a public human-robot interaction benchmark, TFMPathy significantly improves empathy detection accuracy reported in the literature. We show that TFMPathy under a fine-tuning setup has better cross-subject generalisation capacity over baseline methods.

Labels Generated by Large Language Models Help Measure People's Empathy in Vitro
MR Hasan, Y Yao, MZ Hossain, A Krishna, I Rudas, S Rahman and T Gedeon
[Code] [arXiv] [PDF]

This paper explores LLMs' potential for in-vitro applications: using LLM-generated labels to improve supervised training of mainstream models. We examine two strategies - (1) noisy label correction and (2) training data augmentation - for empathy prediction from text. A RoBERTa pre-trained language model (PLM) trained with noise-reduced labels yields a state-of-the-art Pearson correlation coefficient of 0.648 on the public NewsEmp benchmarks.

Empathy Detection from Text, Audiovisual, Audio or Physiological Signals: A Systematic Review of Task Formulations and Machine Learning Methods
MR Hasan, MZ Hossain, S Ghosh, A Krishna and T Gedeon
IEEE Transactions on Affective Computing, 2025 (SJR: Q1, IF: 9.8)
[IEEE Xplore] [arXiv] [PDF]

This paper systematically screened 849 papers from 10 well-known academic databases and analysed the final 82 papers. Our analyses reveal several prominent task formulations – including empathy on localised utterances or overall expressions, unidirectional or parallel empathy, and emotional contagion – in monadic, dyadic and group interactions. Empathy detection methods are summarised based on four input modalities – text, audiovisual, audio and physiological signals.

LLM-GEm: Large Language Model-Guided Prediction of People's Empathy Levels towards Newspaper Article
MR Hasan, MZ Hossain, T Gedeon and S Rahman
EACL 2024 Findings
[Code] [ACL Anthology] [PDF] [Presentation Video]

This paper propose Large Language Model-Guided Empathy (LLM-GEm) prediction system, which rectifies annotation errors based on our defined annotation selection threshold and makes the annotations reliable for conventional empathy prediction models, e.g., BERT-based pretrained language models (PLMs). We experiment with three NewsEmpathy datasets involving people’s empathy levels towards newspaper articles and achieve state-of-the-art test performance using a RoBERTa-based PLM.

Thesis Proposal: Detecting Empathy Using Multimodal Language Model
MR Hasan, MZ Hossain, A Krishna, S Rahman and T Gedeon
EACL SRW 2024
[ACL Anthology] [PDF] [Presentation Video]

This proposal outlines a research plan for developing multimodal language models to detect empathy from text and video. In addition to leveraging existing datasets, the proposed study involves collecting real-life interaction video and audio.

Curtin OCAI at WASSA 2023 Empathy, Emotion and Personality Shared Task: Demographic-Aware Prediction Using Multiple Transformers
MR Hasan, MZ Hossain, T Gedeon, S Soon and S Rahman
ACL 2023 WASSA
[Code][ACL Anthology] [PDF] [Presentation Video]

To address the WASSA 2023 shared task on predicting empathy, emotion and other personality trait tasks, our contributions include (1) converting numerical information into meaningful text information using appropriate templates, (2) summarising lengthy articles, and (3) augmenting training data by paraphrasing.


This website template was borrowed from Jon Barron's website.