2004 Articles
Speaker turn segmentation based on between-channel differences
Multichannel recordings of meetings provide information on speaker locations in the timing and level differences between microphones. We have been experimenting with cross-correlation and energy differences as features to identify and segment speaker turns. In particular, we have used LPC whitening, spectral-domain cross-correlation, and dynamic programming to sharpen and disambiguate timing differences between mic channels that may be dominated by noise and reverberation. These cues are classified into individual speakers using spectral clustering (i.e. defined by the top eignenvectors of a similarity matrix). We show that this technique is largely robust to precise details of mic positioning etc., and can be used with some success with data collected from a number of different setups, as provided by the NIST 2004 Meetings evaluation.
Subjects
Files
- nist04-turnid.pdf application/pdf 235 KB Download File
Also Published In
- Title
- NIST ICASSP 2004 Meeting Recognition Workshop, Montreal
- Publisher
- National Institute of Standards and Technology
More About This Work
- Academic Units
- Electrical Engineering
- Published Here
- June 29, 2012