A data-centric approach for portuguese speech recognition: language model and its implications

Alvarenga, João Paulo Reis; Merschmann, Luiz Henrique de Campos; Luz, Eduardo José da Silva

Artigo

A data-centric approach for portuguese speech recognition: language model and its implications

Data

2023

Autores

Alvarenga, João Paulo Reis

Merschmann, Luiz Henrique de Campos

Luz, Eduardo José da Silva

Editor

Institute of Electrical and Electronics Engineers

Abstract

Recent advances in Automatic Speech Recognition have made it possible to achieve a quality never seen before in the literature, both for languages with abundant data, such as English, which has a large number of studies and for the Portuguese language, which has a more limited amount of resources and studies. The most recent advances address speech recognition problems with Transformers based models, which have the capability to perform the speech recognition task directly from the raw signal, without the need for manual feature extraction. Some studies have already shown that it is possible to further improve the quality of the transcription of these models using language models within the decoding stage, however, the real impact of such language models is still not clear, especially for the Brazilian Portuguese scenario. Also, it is known that the quality of the data used for training the models is of paramount importance, however, there are few works in the literature addressing this issue. This work explores the impact of language models applied to Portuguese speech recognition both in terms of data quality and computational performance, with a data-centric approach. We propose an approach to measure similarity between datasets and, thus, assist in decision-making during training. The approach indicates paths for the advancement of the state-of-the-art aiming at Portuguese speech recognition, showing that it is possible to reduce the size of the language model by 80% and still achieve error rates around 7.17% for the Common Voice dataset. The source code is available at https://github.com/joaoalvarenga/language-model-evaluation.

Procedência

Submitted by Eliana Bernardes (eliana@biblioteca.ufla.br) on 2024-09-26T16:36:25Z No. of bitstreams: 0
Approved for entry into archive by Eliana Bernardes (eliana@biblioteca.ufla.br) on 2024-09-26T16:36:34Z (GMT) No. of bitstreams: 0
Made available in DSpace on 2024-09-26T16:36:34Z (GMT). No. of bitstreams: 0 Previous issue date: 2023

Palavras-chave

Automatic speech recognition, Language model, Brazilian portuguese, Wav2vec2, KenLM

Citação

ALVARENGA, J. P. R.; MERSCHMANN, L. H. de C.; LUZ, E. J. da S. A data-centric approach for portuguese speech recognition: language model and its implications. IEEE Latin America Transactions, [S.l.], v. 21, n. 4, p. 546-556, 2023.

URI

https://repositorio.ufla.br/handle/1/59506
https://latamt.ieeer9.org/index.php/transactions/article/view/7464

Coleções

DCC - Artigos publicados em periódicos

Página do item completo

A data-centric approach for portuguese speech recognition: language model and its implications

Notas

Data

Autores

Orientadores

Editores

Coorientadores

Membros de banca

Título da Revista

ISSN da Revista

Título de Volume

Editor

Faculdade, Instituto ou Escola

Departamento

Programa de Pós-Graduação

Agência de fomento

Tipo de impacto

Áreas Temáticas da Extenção

Objetivos de Desenvolvimento Sustentável

Dados abertos

Resumo

Abstract

Descrição

Área de concentração

Agência de desenvolvimento

Palavra chave

Marca

Objetivo

Procedência

Impacto da pesquisa

Resumen

Palavras-chave

ISBN

DOI

Citação

Link externo

URI

Coleções

Avaliação

Revisão

Suplementado Por

Referenciado Por