Acoustic model

From lingwiki

Revision as of 15:56, 5 September 2009 by Abney (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

An acoustic model is a statistical representation of sounds that make up words. These models are used in speech recognizers to recognize speech.

Contents

Use of Acoustic Modeling in Speech Recognition

Acoustic models model a speaker's interaction with a speech recognizer; they model the way that a speaker pronounces the words in a word sequence. They are used to compute probabilities of an acoustic word string (what has been said) given a hypothesized word string (what has been predicted).

Modern acoustic models are built from Hidden Markov Models which model individual words. There are two types of acoustic models used, which differ in the nature of their building blocks:

  • Phonetic Acoustic Models
  • Fenonic Acoustic Models

Phonetic Acoustic Models

Phonetic Acoustic Models are based on basic linguistic units. They are the more widely used acoustic model.

Construction

  1. A phonetic dictionary must be created for the vocabulary being used. If there are multiple pronunciations of the same word, these must all be put in the dictionary.
  2. Each symbol out of the alphabet chosen must have a HMM associated to it (with distinguished start and end states).
  3. An HMM for each word is constructed out of concatenated HMMs of alphabetic symbols.
  4. A composite model is created by concatenating HMMs and inserting silence in between them.
  5. Training: parameters for HMMs are estimated from the output of an acoustic processor (a prepared text is read and the composite model is used as a model for the production mechanism of the output).

Fenonic Acoustic Models

Fenonic Models do not presuppose any phonetic concepts. Base forms are created from the output of an acoustic processor; that is, in order to create all base forms for a model, a speaker must pronounce each word of the vocabulary. These are not as widely used in speech recognition.

External Links

Personal tools