Medical Engineering & Physics
Volume 28, Issue 8 , Pages 741-748, October 2006

Investigation of an HMM/ANN hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals

  • Prasad D. Polur

      Affiliations

    • Corresponding Author InformationCorresponding author. Present address: 520 West Franklin Street, #1901, Richmond, VA 23220, USA. Tel.: +1 804 852 4624; fax: +1 804 828 4454.
  • ,
  • Gerald E. Miller

      Affiliations

    • Tel.: +1 804 828 7263; fax: +1 804 827 0290.

Department of Biomedical Engineering, Virginia Commonwealth University, 1112 East Clay Street, Room 220, P.O. Box 980694, Richmond, VA 23298, USA

Received 30 September 2004; received in revised form 8 November 2005; accepted 9 November 2005. published online 15 December 2005.

Abstract 

Computer speech recognition of individuals with dysarthria, such as cerebral palsy patients requires a robust technique that can handle conditions of very high variability and limited training data. In this study, application of a 10 state ergodic hidden Markov model (HMM)/artificial neural network (ANN) hybrid structure for a dysarthric speech (isolated word) recognition system, intended to act as an assistive tool, was investigated. A small size vocabulary spoken by three cerebral palsy subjects was chosen. The effect of such a structure on the recognition rate of the system was investigated by comparing it with an ergodic hidden Markov model as a control tool. This was done in order to determine if this modified technique contributed to enhanced recognition of dysarthric speech. The speech was sampled at 11kHz. Mel frequency cepstral coefficients were extracted from them using 15ms frames and served as training input to the hybrid model setup. The subsequent results demonstrated that the hybrid model structure was quite robust in its ability to handle the large variability and non-conformity of dysarthric speech. The level of variability in input dysarthric speech patterns sometimes limits the reliability of the system. However, its application as a rehabilitation/control tool to assist dysarthric motor impaired individuals holds sufficient promise.

Keywords: Artificial neural network, Cerebral palsy, Dysarthric speech, Ergodic model, Hidden Markov model, Mel frequency cepstral coefficients, Speech recognition

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

PII: S1350-4533(05)00244-4

doi:10.1016/j.medengphy.2005.11.002

Medical Engineering & Physics
Volume 28, Issue 8 , Pages 741-748, October 2006