Automatic detection of voice pathologies enables non-invasive, low cost and objective assessments of the presence of disorders, as well as accelerating and improving the process of diagnosis and clinical treatment given to patients. In this work, a vector made up of 28 acoustic parameters is evaluated using principal component analysis (PCA), kernel principal component analysis (kPCA) and an auto-associative neural network (NLPCA) in four kinds of pathology detection (hyperfunctional dysphonia, functional dysphonia, laryngitis, vocal cord paralysis) using the a, i and u vowels, spoken at a high, low and normal pitch. The results indicate that the kPCA and NLPCA methods can be considered a step towards pathology detection of the vocal folds. The results show that such an approach provides acceptable results for this purpose, with the best efficiency levels of around 100%. The study brings the most commonly used approaches to speech signal processing together and leads to a comparison of the machine learning methods determining the health status of the patient.
@article{bwmeta1.element.bwnjournal-article-amcv25i3p631bwm, author = {Daria Panek and Andrzej Skalski and Janusz Gajda and Ryszard Tadeusiewicz}, title = {Acoustic analysis assessment in speech pathology detection}, journal = {International Journal of Applied Mathematics and Computer Science}, volume = {25}, year = {2015}, pages = {631-643}, language = {en}, url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-amcv25i3p631bwm} }
Daria Panek; Andrzej Skalski; Janusz Gajda; Ryszard Tadeusiewicz. Acoustic analysis assessment in speech pathology detection. International Journal of Applied Mathematics and Computer Science, Tome 25 (2015) pp. 631-643. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-amcv25i3p631bwm/
[000] Arroyave, J.R.O., Bonilla, J.F.V. and Trejos, E.D. (2012). Acoustic analysis and non linear dynamics applied to voice pathology detection: A review, Recent Patents on Signal Processing 2(2): 1-11.
[001] Atal, B.S. and Hanauer, S.L. (1971). Speech analysis and synthesis by linear prediction of the speech wave, The Journal of the Acoustical Society of America 50(2B): 637-655.
[002] Belafsky, P.C., Postma, G.N., Reulbach, T.R., Holland, B.W. and Koufman, J.A. (2002). Muscle tension dysphonia as a sign of underlying glottal insufficiency, Otolaryngology-Head and Neck Surgery 127(5): 448-451.
[003] Bishop, C.M. (2006). Pattern Recognition and Machine Learning, Vol. 1, Springer, New York, NY. | Zbl 1107.68072
[004] Brinca, L.F., Batista, A.P.F., Tavares, A.I., Goncalves, I.C. and Moreno, M.L. (2014). Use of cepstral analyses for differentiating normal from dysphonic voices: A comparative study of connected speech versus sustained vowel in European Portuguese female speakers, Journal of Voice 28(3): 282-286.
[005] Eadie, T.L. and Doyle, P.C. (2005). Classification of dysphonic voice: Acoustic and auditory-perceptual measures, Journal of Voice 19(1): 1-14.
[006] Engel, Z.W., Klaczynski, M. and Wszolek, W. (2007). A vibroacoustic model of selected human larynx diseases, International Journal of Occupational Safety and Ergonomics 13(4): 367.
[007] Farrus, M., Hernando, J. and Ejarque, P. (2007). Jitter and shimmer measurements for speaker recognition, Annual Conference of the International Speech Communication Association (Interspeech 2007), Antwerp, Belgium, pp. 778-781.
[008] Fong, S., Lan, K. and Wong, R. (2013). Classifying human voices by using hybrid SFX time-series preprocessing and ensemble feature selection, BioMed Research International 2013:1-27, DOI: 10.1155/2013/720834.
[009] Fraile, R., Saenz-Lechon, N., Godino-Llorente, J., Osma-Ruiz, V. and Fredouille, C. (2009). Automatic detection of laryngeal pathologies in records of sustained vowels by means of mel-frequency cepstral coefficient parameters and differentiation of patients by sex, Folia phoniatrica et logopaedica 61(3): 146-152.
[010] Fujinaga, I. (1996). Adaptive Optical Music Recognition, Ph.D. thesis, McGill University, Montreal.
[011] Goddard, J., Schlotthauer, G., Torres, M. and Rufiner, H. (2009). Dimensionality reduction for visualization of normal and pathological speech data, Biomedical Signal Processing and Control 4(3): 194-201.
[012] Godino-Llorente, J.I. and Gomez-Vilda, P. (2004). Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors, IEEE Transactions on Biomedical Engineering 51(2): 380-384.
[013] Godino-Llorente, J.I., Gomez-Vilda, P. and Blanco-Velasco, M. (2006a). Dimensionality reduction of a pathological voice quality assessment system based on Gaussian mixture models and short-term cepstral parameters, IEEE Transactions on Biomedical Engineering 53(10): 1943-1953.
[014] Godino-Llorente, J.I., Sáenz-Lechón, N., Osma-Ruiz, V., Aguilera-Navarro, S. and Gómez-Vilda, P. (2006b). An integrated tool for the diagnosis of voice disorders, Medical Engineering & Physics 28(3): 276-289.
[015] Hadjitodorov, S. and Mitev, P. (2002). A computer system for acoustic analysis of pathological voices and laryngeal diseases screening, Medical Engineering & Physics 24(6): 419-429.
[016] Horii, Y. (1980). Vocal shimmer in sustained phonation, Journal of Speech, Language, and Hearing Research 23(1): 202-209.
[017] Hu, H. and Zahorian, S.A. (2008). A neural network based nonlinear feature transformation for speech recognition, 9th Annual Conference of the International Speech Communication Association (Interspeech 2008), Brisbane, Australia, pp. 1533-1536.
[018] Huber, J.E., Stathopoulos, E.T., Curione, G.M., Ash, T.A. and Johnson, K. (1999). Formants of children, women, and men: The effects of vocal intensity variation, The Journal of the Acoustical Society of America 106(3): 1532-1542.
[019] Imai, S. (1983). Cepstral analysis synthesis on the mel frequency scale, IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP'83, Boston, MA, USA, Vol. 8, pp. 93-96.
[020] Jiang, J.J., Diaz, C.E. and Hanson, D.G. (1998). Finite element modeling of vocal fold vibration in normal phonation and hyperfunctional dysphonia: Implications for the pathogenesis of vocal nodules, Annals of Otology, Rhinology and Laryngology 107(7): 603-610.
[021] Joanes, D. and Gill, C. (1998). Comparing measures of sample skewness and kurtosis, Journal of the Royal Statistical Society: Series D (The Statistician) 47(1): 183-189.
[022] Jothilakshmi, S. (2014). Automatic system to detect the type of voice pathology, Applied Soft Computing 21: 244-249.
[023] Lieberman, P. (1963). Some acoustic measures of the fundamental periodicity of normal and pathologic larynges, The Journal of the Acoustical Society of America 35(3): 344-353.
[024] Makhoul, J. (1975). Linear prediction: A tutorial review, Proceedings of the IEEE 63(4): 561-580.
[025] Makki, B., Hosseini, M.N. and Seyyedsalehi, S.A. (2010). An evolving neural network to perform dynamic principal component analysis, Neural Computing and Applications 19(3): 459-463.
[026] Manfredi, C., D'Aniello, M., Bruscaglioni, P. and Ismaelli, A. (2000). A comparative analysis of fundamental frequency estimation methods with application to pathological voices, Medical Engineering & Physics 22(2): 135-147.
[027] Maran, A. (1983). Description of specific diseases of the larynx, in R. Harden and A. Marcus (Eds.), Otorhinolaryngology, Vol. 4, Springer, Dordrecht, pp. 99-104.
[028] Matassini, L., Hegger, R., Kantz, H. and Manfredi, C. (2000). Analysis of vocal disorders in a feature space, Medical Engineering & Physics 22(6): 413-418.
[029] Mathieson, L., Hirani, S., Epstein, R., Baken, R., Wood, G. and Rubin, J. (2009). Laryngeal manual therapy: A preliminary study to examine its treatment effects in the management of muscle tension dysphonia, Journal of Voice 23(3): 353-366.
[030] Mehta, D.D., Deliyski, D.D., Zeitels, S.M., Quatieri, T.F. and Hillman, R.E. (2010). Voice production mechanisms following phonosurgical treatment of early glottic cancer, The Annals of Otology, Rhinology, and Laryngology 119(1): 1.
[031] Morrison, M.D., Nichol, H. and Rammage, L.A. (1986). Diagnostic criteria in functional dysphonia, The Laryngoscope 96(1): 1-8.
[032] Nicolosi, L., Harryman, E. and Kresheck, J. (2004). Terminology of Communication Disorders: Speech-Language-Hearing, Lippincott Williams & Wilkins, Philadelphia, PA.
[033] Noll, A.M. (1967). Cepstrum pitch determination, The Journal of the Acoustical Society of America 41(2): 293-309.
[034] Oja, E. (2002). Unsupervised learning in neural computation, Theoretical Computer Science 287(1): 187-207. | Zbl 1061.68129
[035] Rabiner, L.R. and Juang, B.-H. (1993). Fundamentals of Speech Recognition, Vol. 14, PTR Prentice Hall, Englewood Cliffs, NJ.
[036] Rachida, D. and Amar, D. (2009). Effects of acoustic interaction between the subglottic and supraglottic cavities of the human phonatory system, Canadian Acoustics 37(2): 37-43.
[037] Roy, N. (2003). Functional dysphonia, Current Opinion in Otolaryngology & Head and Neck Surgery 11(3): 144-148.
[038] Saenz-Lechon, N., Godino-Llorente, J.I., Osma-Ruiz, V., Blanco-Velasco, M. and Cruz-Roldan, F. (2006). Automatic assessment of voice quality according to the GRBAS scale, 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS'06, New York, NY, USA, pp. 2478-2481.
[039] Saldanha, J.C., Ananthakrishna, T. and Pinto, R. (2014). Vocal fold pathology assessment using mel-frequency cepstral coefficients and linear predictive cepstral coefficients features, Journal of Medical Imaging and Health Informatics 4(2): 168-173.
[040] Schölkopf, B., Smola, A. and Müller, K.-R. (1999). Kernel principal component analysis, in B. Schölkopf, C.J.C. Burges and A.J. Smola (Eds.), Advances in Kernel Methods-Support Vector Learning, MIT Press, Cambridge, MA.
[041] Scholz, M., Fraunholz, M. and Selbig, J. (2008). Nonlinear principal component analysis: Neural network models and applications, in A.N. Gorban et al. (Eds.), Principal Manifolds for Data Visualization and Dimension Reduction, Springer, Berlin/Heidelberg, pp. 44-67.
[042] Scholz, M. and Vigário, R. (2002). Nonlinear PCA: A new hierarchical approach, 10th European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, pp. 439-444.
[043] Skalski, A., Zielinski, T. and Deliyski, D. (2008). Analysis of vocal folds movement in high speed videoendoscopy based on level set segmentation and image registration, International Conference on Signals and Electronic Systems, ICSES'08, Kraków, Poland, pp. 223-226.
[044] Steinecke, I. and Herzel, H. (1995). Bifurcations in an asymmetric vocal-fold model, The Journal of the Acoustical Society of America 97(3): 1874-1884.
[045] Sulica, L. and Blitzer, A. (Eds.) (2006). Vocal Fold Paralysis, Springer, Berlin/Heidelberg.
[046] Tadeusiewicz, R., Korbicz, J., Rutkowski, L. and Duch, W. (Eds.) (2013). Neural Networks in Biomedical Engineering, Inżynieria biomedyczna. Podstawy i zastosowania, Vol. 9, Akademicka Oficyna Wydawnicza EXIT, Warsaw, (in Polish).
[047] Tsanas, A. (2013). Acoustic analysis toolkit for biomedical speech signal processing: Concepts and algorithms, Models and Analysis of Vocal Emissions for Biomedical Applications 2: 37-40.
[048] Umapathy, K., Krishnan, S., Parsa, V. and Jamieson, D.G. (2005). Discrimination of pathological voices using a time-frequency approach, IEEE Transactions on Biomedical Engineering 52(3): 421-430.
[049] Wang, Q. (2012). Kernel principal component analysis and its applications in face recognition and active shape models, ARXIV 1207.3538.
[050] Wong, D., Markel, J. and Gray Jr, A. (1979). Least squares glottal inverse filtering from the acoustic speech waveform, IEEE Transactions on Acoustics, Speech and Signal Processing 27(4): 350-355.
[051] Yumoto, E., Gould, W.J. and Baer, T. (1982). Harmonics-to-noise ratio as an index of the degree of hoarseness, The Journal of the Acoustical Society of America 71(6): 1544-1550.
[052] Zahorian, S. and Hu, H. (2011). Nonlinear Dimensionality Reduction Methods for Use with Automatic Speech Recognition, Vol. 06, Speech Technologies Source: InTech, Rijeka.