Estudio sobre el impacto del corpus de entrenamiento del modelo de lenguaje en las prestaciones de un reconocedor de habla

  1. Docío Fernández, Laura
  2. Regueira, Xosé Luis
  3. Piñeiro Martín, Andrés
  4. García Mateo, Carmen
Procesamiento del lenguaje natural

ISSN: 1135-5948

Ano de publicación: 2018

Número: 61

Páxinas: 75-82

Tipo: Artigo

Outras publicacións en: Procesamiento del lenguaje natural

Within the automatic speech recognition, statistical language models based on the probability of word sequences (n-grams) represent one of the two pillars on which its correct functioning is based. In this paper, the impact they have on the recognition result is exposed as these models are improved with more text of better quality, when these are adjusted to the final application of the system, and therefore, when the number out of vocabulary (OOV) words is reduced. The recognizer with the different language models has been applied to audio cuts corresponding to three experimental frames: formal orality, talk on newscasts, and TED talks in Galician. The results obtained clearly show an improvement over the experimental frameworks proposed.

