A data-centric approach for portuguese speech recognition: language model and its implications

dc.creatorAlvarenga, João Paulo Reis
dc.creatorMerschmann, Luiz Henrique de Campos
dc.creatorLuz, Eduardo José da Silva
dc.date.accessioned2024-09-26T16:36:34Z
dc.date.available2024-09-26T16:36:34Z
dc.date.issued2023
dc.description.abstractRecent advances in Automatic Speech Recognition have made it possible to achieve a quality never seen before in the literature, both for languages with abundant data, such as English, which has a large number of studies and for the Portuguese language, which has a more limited amount of resources and studies. The most recent advances address speech recognition problems with Transformers based models, which have the capability to perform the speech recognition task directly from the raw signal, without the need for manual feature extraction. Some studies have already shown that it is possible to further improve the quality of the transcription of these models using language models within the decoding stage, however, the real impact of such language models is still not clear, especially for the Brazilian Portuguese scenario. Also, it is known that the quality of the data used for training the models is of paramount importance, however, there are few works in the literature addressing this issue. This work explores the impact of language models applied to Portuguese speech recognition both in terms of data quality and computational performance, with a data-centric approach. We propose an approach to measure similarity between datasets and, thus, assist in decision-making during training. The approach indicates paths for the advancement of the state-of-the-art aiming at Portuguese speech recognition, showing that it is possible to reduce the size of the language model by 80% and still achieve error rates around 7.17% for the Common Voice dataset. The source code is available at https://github.com/joaoalvarenga/language-model-evaluation.pt_BR
dc.description.provenanceSubmitted by Eliana Bernardes (eliana@biblioteca.ufla.br) on 2024-09-26T16:36:25Z No. of bitstreams: 0en
dc.description.provenanceApproved for entry into archive by Eliana Bernardes (eliana@biblioteca.ufla.br) on 2024-09-26T16:36:34Z (GMT) No. of bitstreams: 0en
dc.description.provenanceMade available in DSpace on 2024-09-26T16:36:34Z (GMT). No. of bitstreams: 0 Previous issue date: 2023en
dc.identifier.citationALVARENGA, J. P. R.; MERSCHMANN, L. H. de C.; LUZ, E. J. da S. A data-centric approach for portuguese speech recognition: language model and its implications. IEEE Latin America Transactions, [S.l.], v. 21, n. 4, p. 546-556, 2023.pt_BR
dc.identifier.urihttps://repositorio.ufla.br/handle/1/59506
dc.identifier.urihttps://latamt.ieeer9.org/index.php/transactions/article/view/7464pt_BR
dc.languageen_USpt_BR
dc.publisherInstitute of Electrical and Electronics Engineerspt_BR
dc.rightsrestrictAccesspt_BR
dc.sourceIEEE Latin America Transactionspt_BR
dc.subjectAutomatic speech recognitionpt_BR
dc.subjectLanguage modelpt_BR
dc.subjectBrazilian portuguesept_BR
dc.subjectWav2vec2pt_BR
dc.subjectKenLMpt_BR
dc.titleA data-centric approach for portuguese speech recognition: language model and its implicationspt_BR
dc.typeArtigopt_BR

Arquivos

Licença do pacote

Agora exibindo 1 - 1 de 1
Carregando...
Imagem de Miniatura
Nome:
license.txt
Tamanho:
956 B
Formato:
Item-specific license agreed upon to submission
Descrição: