Speech Recognition
CSE Seminar Talk, Dr. Rudzicz, "First, we shape our tools: How to build a better speech recognizer", Concordia University, Montréal
Submitted by rene on Mon, 2011-11-21 09:421. Abstract
In this talk I briefly survey some of my previous research and then even more briefly extrapolate as to future extensions of this work.

I will talk about improving Automatic Speech Recognition (ASR) for speakers with speech disabilities by incorporating knowledge of their speech production. This involves the acquisition of the TORGO database of disabled articulation which demonstrates several consistent behaviours among speakers, including predictable pronunciation errors. Articulatory data are then used to train augmented ASR systems that model the statistical relationships between the vocal tract and its acoustic effluence. I show that dynamic Bayesian networks augmented with instantaneous articulatory variables outperform even discriminative alternatives. This leads to work that incorporates a more rigid theory of speech production, i.e., task-dynamics, that models the high-level and long-term aspects of speech production. For this task, I devised an algorithm for estimating articulatory positions given only acoustics that significantly outperforms the former state-of-the-art.
Finally, I present ongoing work into the transformation of disabled speech signals in order to make them more intelligible to human listener and I conclude with some thoughts as to possible paths we may now take.


