Multi-channel Source Separation by Beamforming Trained with Factorial HMMs

Reyes-Gomez, Manuel; Raj, Bhiksha; Ellis, Daniel P. W.

Speaker separation has conventionally been treated as a problem of blind source separation (BSS). This approach does not utilize any knowledge of the statistical characteristics of the signals to be separated, relying mainly on the independence between the various signals to separate them. Maximum-likelihood techniques, on the other hand, utilize knowledge of the a priori probability distributions of the signals from the speakers, in order to effect separation. Previously (Reyes-Gomez, M.J. et al., Proc. ICASSP, 2003), we presented a maximum-likelihood speaker separation technique that utilizes detailed statistical information about the signals to be separated, represented in the form of hidden Markov models (HMMs), to estimate the parameters of a filter-and-sum processor for signal separation. We show that the filters that are estimated for a particular utterance by a speaker generalize well to other utterances by the same speaker, provided the location of the various speakers remains constant. Thus, filters that have been estimated using a "training" utterance of a known transcript can be used to separate all future signals by the speaker from mixtures of speech signals in an unsupervised manner. On the other hand, the filters are ineffective for other speakers, even at the same locations, indicating that they capture the spatio-frequency characteristics of the speaker.


Also Published In

2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics: October 19-22, 2003, Mohonk Mountain House, New Paltz, NY, USA

More About This Work

Academic Units
Electrical Engineering
Published Here
June 29, 2012