Towards spike-based speech processing: A biologically plausible approach to simple acoustic classification

Ismail Uysal; Harsha Sathyendra; John G. Harris

Ismail Uysal ; Harsha Sathyendra ; John G. Harris

International Journal of Applied Mathematics and Computer Science, Tome 18 (2008), p. 129-137 / Harvested from The Polish Digital Mathematics Library

Access to full text
Full (PDF)

Résumé

Shortcomings of automatic speech recognition (ASR) applications are becoming more evident as they are more widely used in real life. The inherent non-stationarity associated with the timing of speech signals as well as the dynamical changes in the environment make the ensuing analysis and recognition extremely difficult. Researchers often turn to biology seeking clues to make better engineered systems, and ASR is no exception with the usage of feature sets such as Mel frequency cepstral coefficients, which employ filter banks similar to cochlear filter banks in frequency distribution and bandwidth. In this paper, we delve deeper into the mechanics of the human auditory system to take this biological inspiration to the next level. The main goal of this research is to investigate the computation potential of spike trains produced at the early stages of the auditory system for a simple acoustic classification task. First, various spike coding schemes from temporal to rate coding are explored, together with various spike-based encoders with various simplicity levels such as rank order coding and liquid state machine. Based on these findings, a biologically plausible system architecture is proposed for the recognition of phonetically simple acoustic signals which makes exclusive use of spikes for computation. The performance tests show superior performance on a noisy vowel data set when compared with a conventional ASR system.

Publié le : 2008-01-01
EUDML-ID : urn:eudml:doc:207871

@article{bwmeta1.element.bwnjournal-article-amcv18i2p129bwm,
     author = {Ismail Uysal and Harsha Sathyendra and John G. Harris},
     title = {Towards spike-based speech processing: A biologically plausible approach to simple acoustic classification},
     journal = {International Journal of Applied Mathematics and Computer Science},
     volume = {18},
     year = {2008},
     pages = {129-137},
     language = {en},
     url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-amcv18i2p129bwm}
}

Ismail Uysal; Harsha Sathyendra; John G. Harris. Towards spike-based speech processing: A biologically plausible approach to simple acoustic classification. International Journal of Applied Mathematics and Computer Science, Tome 18 (2008) pp. 129-137. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-amcv18i2p129bwm/

Bibliographie

[000] Atal B. S. and Hanauer S. L. (1971). Speech analysis and synthesis by linear prediction, Journal of the Acoustical Society of America 50(2B): 637-655.

[001] Davis S. B. and Mermelstein P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech, Signal Processing 28(4): 357-366.

[002] Dayan P. and Abbott L. F. (2001). Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems, MIT Press, Cambridge, MA. | Zbl 1051.92010

[003] Delorme A. and Thorpe S. J. (2001). Face identification using one spike per neuron: resistance to image degradations, Neural Networks 14(7): 795-803.

[004] Hopfield J. J. and Brody C. D. (2001). What is a moment? Transient synchrony as a collective mechanism for spatiotemporal integration, Proceedings of the National Academy of Sciences USA 98(3): 1282-1287.

[005] Jaeger H. (2001). The “echo state” approach to analysing and training recurrent neural networks, Technical Report GMD Report 148, German National Research Center for Information Technology.

[006] Maass W. Natschlager T. and Markram H. (2002). Real-time computing without stable states: A new framework for neural computation based on perturbations, Neural Computation 14(11): 2531-2560. | Zbl 1057.68618

[007] Markram H., Lubke J., Frotscher M. and Sakmann B. (1997). Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs, Science 275(5297): 213-215.

[008] Meddis R. (1986). Simulation of mechanical to neural transduction in the auditory receptor, Journal of the Acoustical Society of America 79(3): 702-711.

[009] Moissl U. and Meyer-Base U. (2000). A comparison of different methods to assess phase-locking in auditory neurons, International Conference of IEEE-EMBS on Information Technology Applications in Biomedicine, Vol. 2, Arlington, USA, pp. 840-843.

[010] Rieke F., Warland D., de Ruyter can Steveninck R. and Bialek W. (1999). Spikes - Exploring the Neural Code, MIT Press, Cambridge, MA. | Zbl 0912.92004

[011] Rullen R. V., Gautrais J., Delorme A. and Thorpe S. J. (1998). Face processing using one spike per neuron, Biosystems 48(1-3): 229-239.

[012] Rullen R. V., Guyonneau R. and Thorpe S. J. (2005). Spike times make sense, Trends in Neurosciences 28(1): 1-4.

[013] Sachs M. B. (1984). Neural coding of complex sounds: Speech, Annual Review of Physiology 46: 261-273.

[014] Skowronski M. D. and Harris J. G. (2004). Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition, Journal of the Acoustical Society of America 116(3): 1774-1780.

[015] Sumner C. J. and Lopez-Poveda E. A. (2002). A revised model of the inner-hair cell and auditory-nerve complex, Journal of the Acoustical Society of America 111(5): 2178-2188.

[016] Sumner C. J., Lopez-Poveda E. A., O'Mard L. P. and Meddis R. (2003). Adaptation in a revised inner-hair cell model, Journal of the Acoustical Society of America 113(2): 893-901.

[017] Terman D. and Wang D. (1995). Global competition and local cooperation in a network of neural oscillators, Physica D. 81(1-2): 148-176. | Zbl 0882.68153

[018] Thorpe S. J. and Gautrais J. (1998). Rank order coding, in J. Bower (ed.), Computational Neuroscience: Trends in Research, New York: Plenum Press, pp. 113-119.

[019] Uysal I., Sathyendra H. and Harris J. G. (2006). A biologically plausible system approach for noise robust vowel recognition, Proceedings of the IEEE Midwest Symposium on Circuits and Systems, Vol. 1, San Juan, Puerto Rico, pp. 245-249.

[020] Uysal I., Sathyendra H. and Harris J. G. (2007a). A duplex theory of spike coding in the early stages of the auditory system, Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing, Vol. 4, Honolulu, USA, pp. 733-736.

[021] Uysal I., Sathyendra H. and Harris J. G. (2007b). Spike-based feature extraction for noise robust speech recognition using phase synchrony coding, Proceedings of the IEEE International Symposiom on Circuits and Systems, New Orleans, USA, pp. 1529-1532.

[022] VanRullen R., Guyonneau R. and Thorpe S. J. (2005). Spike times make sense, Trends in Neurosciences 28(1): 1-4.

[023] Verstraeten D., Schrauwen B., Stroobandt D. and Campenhout J. V. (2005). Isolated word recognition with the liquid state machine: A case study, Information Processing Letters 95(6): 521-528. | Zbl 1184.68257