Academic Commons Search Results
http://academiccommons.columbia.edu/catalog.rss?f%5Bauthor_facet%5D%5B%5D=Athineos%2C+Marios&f%5Bdepartment_facet%5D%5B%5D=Electrical+Engineering&q=&rows=500&sort=record_creation_date+desc
Academic Commons Search Resultsen-usSound Texture Modelling with Linear Prediction in both Time and Frequency Domains
http://academiccommons.columbia.edu/catalog/ac:148745
Athineos, Marios; Ellis, Daniel P. W.http://hdl.handle.net/10022/AC:P:13735Fri, 29 Jun 2012 00:00:00 +0000Sound textures - for instance, a crackling fire, running water, or applause - constitute a large and largely neglected class of audio signals. Whereas tonal sounds have been effectively and flexibly modelled with sinusoids, aperiodic energy is usually modelled as white noise filtered to match the approximate spectrum of the original over 10-30 ms windows, which fails to provide a perceptually satisfying reproduction of many real-world noisy sound textures. We attribute this failure to the loss of short-term temporal structure, and we introduce a second modelling stage in which the time envelope of the residual from conventional linear predictive modelling is itself modelled with linear prediction in the spectral domain. This cascade time- and frequency-domain linear prediction (CTFLP) leads to noise-excited resyntheses that have high perceptual fidelity. We perform a novel quantitative error analysis by measuring the proportional error within time-frequency cells across a range of timescales.Electrical engineeringde171Electrical EngineeringArticlesFrequency-domain linear prediction for temporal features
http://academiccommons.columbia.edu/catalog/ac:148726
Athineos, Marios; Ellis, Daniel P. W.http://hdl.handle.net/10022/AC:P:13729Fri, 29 Jun 2012 00:00:00 +0000Current speech recognition systems uniformly employ short-time spectral analysis, usually over windows of 10-30 ms, as the basis for their acoustic representations. Any detail below this timescale is lost, and even temporal structures above this level are usually only weakly represented in the form of deltas etc. We address this limitation by proposing a novel representation of the temporal envelope in different frequency bands by exploring the dual of conventional linear prediction (LPC) when applied in the transform domain. With this technique of frequency-domain linear prediction (FDLP), the 'poles' of the model describe temporal, rather than spectral, peaks. By using analysis windows on the order of hundreds of milliseconds, the procedure automatically decides how to distribute poles to model the temporal structure best within the window. While this approach offers many possibilities for novel speech features, we experiment with one particular form, an index describing the 'sharpness' of individual poles within a window, and show a relatively large word error rate improvement from 4.97% to 3.81% in a recognizer trained on general conversational telephone speech and tested on a small-vocabulary spontaneous numbers task. We analyze this improvement in terms of the confusion matrices and suggest how the newly-modeled fine temporal structure may be helping.Electrical engineering, Artificial intelligencede171Electrical EngineeringArticlesLP-TRAP: Linear predictive temporal patterns
http://academiccommons.columbia.edu/catalog/ac:148642
Athineos, Marios; Hermansky, Hynek; Ellis, Daniel P. W.http://hdl.handle.net/10022/AC:P:13711Thu, 28 Jun 2012 00:00:00 +0000Autoregressive modeling is applied for approximating the temporal evolution of spectral density in critical-band-sized subbands of a segment of speech signal. The generalized autocorrelation linear predictive technique allows for a compromise between fitting the peaks and the troughs of the Hilbert envelope of the signal in the sub-band. The cosine transform coefficients of the approximated sub-band envelopes, computed recursively from the all-pole polynomials, are used as inputs to a TRAP-based speech recognition system and are shown to improve recognition accuracy.Electrical engineering, Applied mathematicsde171Electrical EngineeringArticlesPLP2: Autoregressive modeling of auditory-like 2-D spectro-temporal patterns
http://academiccommons.columbia.edu/catalog/ac:148646
Athineos, Marios; Hermansky, Hynek; Ellis, Daniel P. W.http://hdl.handle.net/10022/AC:P:13714Thu, 28 Jun 2012 00:00:00 +0000The temporal trajectories of the spectral energy in auditory critical bands over 250 ms segments are approximated by an all-pole model, the time-domain dual of conventional linear prediction. This quarter-second auditory spectro-temporal pattern is further smoothed by iterative alternation of spectral and temporal all-pole modeling. Just as Perceptual Linear Prediction (PLP) uses an autoregressive model in the frequency domain to estimate peaks in an auditory-like short-term spectral slice, PLP2 uses all-pole modeling in both time and frequency domains to estimate peaks of a two-dimensional spectrotemporal pattern, motivated by considerations of the auditory system.Electrical engineeringde171Electrical EngineeringArticlesPushing the Envelope—Aside
http://academiccommons.columbia.edu/catalog/ac:144529
Morgan, Nelson; Zhu, Qifeng; Stolcke, Andreas; Sönmez, Kemal; Sivadas, Sunil; Shinozaki, Takahiro; Ostendorf, Mari; Jain, Pratibha; Hermansky, Hynek; Ellis, Daniel P. W.; Doddington, George; Chen, Barry; Çetin, Özgür; Bourlard, Hervé; Athineos, MariosWed, 15 Feb 2012 00:00:00 +0000Despite successes, there are still significant limitations to speech recognition performance, particularly for conversational speech and/or for speech with significant acoustic degradations from noise or reverberation. For this reason, authors have proposed methods that incorporate different (and larger) analysis windows, which are described in this article. Note in passing that we and many others have already taken advantage of processing techniques that incorporate information over long time ranges, for instance for normalization (by cepstral mean subtraction as stated in B. Atal (1974) or relative spectral analysis (RASTA) based in H. Hermansky and N. Morgan (1994)). They also have proposed features that are based on speech sound class posterior probabilities, which have good properties for both classification and stream combination.Artificial intelligence, Communicationde171Electrical EngineeringArticlesAutoregressive Modeling of Temporal Envelopes
http://academiccommons.columbia.edu/catalog/ac:141840
Athineos, Marios; Ellis, Daniel P. W.http://hdl.handle.net/10022/AC:P:11819Fri, 18 Nov 2011 00:00:00 +0000Autoregressive (AR) models are commonly obtained from the linear autocorrelation of a discrete-time signal to obtain an all-pole estimate of the signal's power spectrum. We are concerned with the dual, frequency-domain problem. We derive the relationship between the discrete-frequency linear autocorrelation of a spectrum and the temporal envelope of a signal. In particular, we focus on the real spectrum obtained by a type-I odd-length discrete cosine transform (DCT-Io) which leads to the all-pole envelope of the corresponding symmetric squared Hilbert temporal envelope. A compact linear algebra notation for the familiar concepts of AR modeling clearly reveals the dual symmetries between modeling in time and frequency domains. By using AR models in both domains in cascade, we can jointly estimate the temporal and spectral envelopes of a signal. We model the temporal envelope of the residual of regular AR modeling to efficiently capture signal structure in the most appropriate domain.Electrical engineering, Musicde171Electrical EngineeringArticles