Introduction to Automatic Speech Recognition

Zahra Ahmad
5 min readMay 16, 2022

In this article, I will present an introduction to speech recognition system and description of a general pipeline that converts speech into text.

Photo by Sebastian Pandelache on Unsplash

What is Automatic Speech Recognition?

Speech recognition, which is often referred to as automatic speech recognition (ASR), is the ability of a machine to transform natural spoken language to a machine-readable format.

ASR algorithms work through three types of modeling: acoustic modeling language modeling, and pronunciation modelling.

Acoustic modeling in ASR deals with the relationship between linguistic units of speech (e.g., phonemes) and audio signals.

Language modeling in ASR is looking for patterns in sequences of words and therefore helps to distinguish between different words with the same sound.

Pronunciation modelling in ASR provides a mapping between a conventional symbolic transcript of speech, which can exhibit varying degrees of arbitrariness, and an acoustically/phonetically motivated one

Applications of Speech Recognition

There are many uses and applications of ASRs. They are varying from self-servicing call centers and self-ordering machines to mobile devices operated by voice…

--

--

Zahra Ahmad

MSc in Data Science, I love to extract the hell out of any raw data, sexy plots and figures are my coffee