2025 Theses Doctoral
Validity of Automatic Speech Recognition for Intelligibility Assessment in Children with Dysarthria
Purpose: Accurate assessment of speech intelligibility is critical for children with dysarthria secondary to cerebral palsy (CP). Traditional human assessment, such as orthographic transcription and perceptual ratings (e.g., ease of understanding; EoU) can be highly time-consuming or subjective in clinical practice and research. Automatic speech recognition (ASR) may provide a more efficient, objective alternative, but its validity for intelligibility assessment in this population remains unexamined.
This study evaluated the validity of ASR as a tool for intelligibility assessment in children with dysarthria. The most suitable ASR systems for approximating human intelligibility assessment were identified. Methods: Five ASR systems transcribed speech samples produced by twenty children with dysarthria. Additionally, 168 adult listeners provided orthographic transcriptions and EoU ratings of the samples. Word recognition rate (WRR) was measured for both ASR and human listener transcriptions. Pearson correlations were used to assess the relationship between ASR-generated WRR and human WRR, as well as between ASR-generated WRR and human EoU ratings.
Results: Four ASR systems (WhisperX-small, WhisperX-medium, WhisperX-large, and Google Cloud) showed strong correlations with human WRR, with WhisperX-medium demonstrating the strongest correlation. The four systems also exhibited strong correlations with EoU ratings, with Google Cloud ASR showing the strongest correlation. In contrast, Wav2Vec2 demonstrated a weak correlation with both human WRR and EoU ratings.
Conclusions: ASR shows promise as an adjunct tool for intelligibility assessment in children with dysarthria. If developed further, ASR could also be used for real-time feedback on intelligibility to help the children practice their speech skills independently. Of the ASR systems tested, WhisperX-medium appears most promising for approximating human transcription accuracy, whereas Google Cloud ASR is best suited for approximating perceptual ratings. However, differences in ASR performance highlight the need for careful system selection for appropriate clinical applications in this population.
Subjects
Files
-
Choi_columbia_0054D_19227.pdf
application/pdf
760 KB
Download File
More About This Work
- Academic Units
- Biobehavioral Sciences
- Thesis Advisors
- Levy, Erika Shield
- Degree
- Ph.D., Columbia University
- Published Here
- June 11, 2025