Skip navigation.
Semantic Software Lab
Concordia University
Montréal, Canada

CSE Seminar Talk, Dr. Rudzicz, "First, we shape our tools: How to build a better speech recognizer", Concordia University, Montréal

Printer-friendly versionPrinter-friendly versionPDF versionPDF version
2011-12-02 14:00
2011-12-02 15:00


In this talk I briefly survey some of my previous research and then even more briefly extrapolate as to future extensions of this work.

I will talk about improving Automatic Speech Recognition (ASR) for speakers with speech disabilities by incorporating knowledge of their speech production. This involves the acquisition of the TORGO database of disabled articulation which demonstrates several consistent behaviours among speakers, including predictable pronunciation errors. Articulatory data are then used to train augmented ASR systems that model the statistical relationships between the vocal tract and its acoustic effluence. I show that dynamic Bayesian networks augmented with instantaneous articulatory variables outperform even discriminative alternatives. This leads to work that incorporates a more rigid theory of speech production, i.e., task-dynamics, that models the high-level and long-term aspects of speech production. For this task, I devised an algorithm for estimating articulatory positions given only acoustics that significantly outperforms the former state-of-the-art.

Finally, I present ongoing work into the transformation of disabled speech signals in order to make them more intelligible to human listener and I conclude with some thoughts as to possible paths we may now take.


Frank Rudzicz received his PhD in Computer Science from the University of Toronto in 2011, his Master’s degree in Electrical and Computer Engineering from McGill University in 2006 and his Bachelor’s in Computer Science at Concordia University in 2004 He is the recipient of a MITACS Accelerate Canada award, a MITACS Industrial Elevate award, and an NSERC Canada Graduate Scholarship. His expertise includes parsing in natural language processing, acoustic modelling, multimodal interaction, and speech production.


Concordia University, Department of Computer Science and Software Engineering, EV3.309.