Department of Psychology

Detecting local audio-visual synchrony in monologues utilizing vocal pitch and facial landmark trajectories

Steven Cadavid, University of Miami
Mohamed Abdel-Mottaleb, University of Miami
Daniel S. Messinger, University of Miami
Mohammad H. Mahoor, University of Denver
Lorraine E. Bahrick, Florida International University

Date of this Version

1-1-2009

Document Type

Conference Proceeding

Abstract

We describe a novel approach for determining the audio-visual synchrony of a monologue video sequence utilizing vocal pitch and facial landmark trajectories as descriptors of the audio and visual modalities, respectively. The visual component is represented by the horizontal and vertical displacement of corresponding facial landmarks between subsequent frames. These facial landmarks are acquired using the statistical modeling technique, known as the Active Shape Model (ASM). The audio component is represented by the fundamental frequency, or pitch, obtained using the subharmonic-to-harmonic ratio (SHR). The synchrony between the audio and visual feature vectors is computed using Gaussian mutual information. The raw synchrony estimates obtained using this method may contain spurious synchrony values due to over-sensitivity. A filtering method is employed for discarding synchrony values that occur during non-associated audio/visual events. The human visual system is capable of distinguishing rigid and non-rigid motion of an articulator during speech. In an attempt to emulate this process, we separate rigid and non-rigid motion and compute the synchrony attributed to each. Experiments are conducted on a dataset of monologue video clip pairs. Each pair is composed of an asynchronous and synchronous version of the video clip. For the asynchronous video clips, the audio signal is displaced with respect to the visual signal. Experimental results indicate that the proposed approach is successful in detecting facial regions that demonstrate synchrony, and in distinguishing between synchronous and asynchronous sequences. © 2009. The copyright of this document resides with its authors.

DOI

10.5244/C.23.10

Recommended Citation

Cadavid, Steven; Abdel-Mottaleb, Mohamed; Messinger, Daniel S.; Mahoor, Mohammad H.; and Bahrick, Lorraine E., "Detecting local audio-visual synchrony in monologues utilizing vocal pitch and facial landmark trajectories" (2009). Department of Psychology. 79.
https://digitalcommons.fiu.edu/psychology_fac/79

This document is currently not available here.

COinS

DOI

10.5244/C.23.10

Department of Psychology

Detecting local audio-visual synchrony in monologues utilizing vocal pitch and facial landmark trajectories

Date of this Version

Document Type

Abstract

DOI

Recommended Citation

DOI

Search

Links

Browse

Author Corner

Department of Psychology

Detecting local audio-visual synchrony in monologues utilizing vocal pitch and facial landmark trajectories

Authors

Date of this Version

Document Type

Abstract

DOI

Recommended Citation

Share

DOI

Search

Links

Browse

Author Corner