Academic Commons

Articles

Speaker turn segmentation based on between-channel differences

Ellis, Daniel P. W.; Liu, Jerry C.

Multichannel recordings of meetings provide information on speaker locations in the timing and level differences between microphones. We have been experimenting with cross-correlation and energy differences as features to identify and segment speaker turns. In particular, we have used LPC whitening, spectral-domain cross-correlation, and dynamic programming to sharpen and disambiguate timing differences between mic channels that may be dominated by noise and reverberation. These cues are classified into individual speakers using spectral clustering (i.e. defined by the top eignenvectors of a similarity matrix). We show that this technique is largely robust to precise details of mic positioning etc., and can be used with some success with data collected from a number of different setups, as provided by the NIST 2004 Meetings evaluation.

Files

Also Published In

Title
NIST ICASSP 2004 Meeting Recognition Workshop, Montreal

More About This Work

Academic Units
Electrical Engineering
Publisher
National Institute of Standards and Technology
Published Here
June 29, 2012
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.