Speaker turn segmentation based on between-channel differences

Ellis, Daniel P. W.; Liu, Jerry C.

Multichannel recordings of meetings provide information on speaker locations in the timing and level differences between microphones. We have been experimenting with cross-correlation and energy differences as features to identify and segment speaker turns. In particular, we have used LPC whitening, spectral-domain cross-correlation, and dynamic programming to sharpen and disambiguate timing differences between mic channels that may be dominated by noise and reverberation. These cues are classified into individual speakers using spectral clustering (i.e. defined by the top eignenvectors of a similarity matrix). We show that this technique is largely robust to precise details of mic positioning etc., and can be used with some success with data collected from a number of different setups, as provided by the NIST 2004 Meetings evaluation.


Also Published In

NIST ICASSP 2004 Meeting Recognition Workshop, Montreal
National Institute of Standards and Technology

More About This Work

Academic Units
Electrical Engineering
Published Here
June 29, 2012