Time-Frequency Representations for Speech Signals

DSpace Home
→
Harvested articles مقالات مستوردة من مؤسسات وجامعات عالمية
→
MIT Items
→
View Item

dc.creator	Riley, Michael D.
dc.date	2004-10-20T20:00:21Z
dc.date	2004-10-20T20:00:21Z
dc.date	1987-05-01
dc.date.accessioned	2013-10-09T02:47:07Z
dc.date.available	2013-10-09T02:47:07Z
dc.date.issued	2013-10-09
dc.identifier	AITR-974
dc.identifier	http://hdl.handle.net/1721.1/6827
dc.identifier.uri	http://koha.mediu.edu.my:8181/xmlui/handle/1721
dc.description	This work addresses two related questions. The first question is what joint time-frequency energy representations are most appropriate for auditory signals, in particular, for speech signals in sonorant regions. The quadratic transforms of the signal are examined, a large class that includes, for example, the spectrograms and the Wigner distribution. Quasi-stationarity is not assumed, since this would neglect dynamic regions. A set of desired properties is proposed for the representation: (1) shift-invariance, (2) positivity, (3) superposition, (4) locality, and (5) smoothness. Several relations among these properties are proved: shift-invariance and positivity imply the transform is a superposition of spectrograms; positivity and superposition are equivalent conditions when the transform is real; positivity limits the simultaneous time and frequency resolution (locality) possible for the transform, defining an uncertainty relation for joint time-frequency energy representations; and locality and smoothness tradeoff by the 2-D generalization of the classical uncertainty relation. The transform that best meets these criteria is derived, which consists of two-dimensionally smoothed Wigner distributions with (possibly oriented) 2-D guassian kernels. These transforms are then related to time-frequency filtering, a method for estimating the time-varying 'transfer function' of the vocal tract, which is somewhat analogous to ceptstral filtering generalized to the time-varying case. Natural speech examples are provided. The second question addressed is how to obtain a rich, symbolic description of the phonetically relevant features in these time-frequency energy surfaces, the so-called schematic spectrogram. Time-frequency ridges, the 2-D analog of spectral peaks, are one feature that is proposed. If non-oriented kernels are used for the energy representation, then the ridge tops can be identified, with zero-crossings in the inner product of the gradient vector and the direction of greatest downward curvature. If oriented kernels are used, the method can be generalized to give better orientation selectivity (e.g., at intersecting ridges) at the cost of poorer time-frequency locality. Many speech examples are given showing the performance for some traditionally difficult cases: semi-vowels and glides, nasalized vowels, consonant-vowel transitions, female speech, and imperfect transmission channels.
dc.format	10873603 bytes
dc.format	7562496 bytes
dc.format	application/postscript
dc.format	application/pdf
dc.language	en_US
dc.relation	AITR-974
dc.title	Time-Frequency Representations for Speech Signals

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

MIT Items

Show simple item record

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

Time-Frequency Representations for Speech Signals

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account