Use este identificador para citar ou linkar para este item: http://repositorio.ufla.br/jspui/handle/1/58857
Título: Desenvolvimento de modelos de linguagem para extração de aspectos em língua portuguesa
Título(s) alternativo(s): Development of language models for aspect extraction in portuguese
Autores: Ferreira, Danton Diego
Ferreira, Danton Diego
Barbosa, Bruno Henrique Groenner
Pereira, Denilson Alves
Cardoso, Paula Christina Figueira
Vitor, Giovani Bernardes
Palavras-chave: Processamento de linguagem natural
Extração de aspectos
BERT
Modelos de linguagem
Natural language processing
Aspect extraction
Bidirectional Encoder Representations from Transformers
Language models
Data do documento: 29-Jan-2024
Editor: Universidade Federal de Lavras
Citação: FERREIRA NETO, J. C. Desenvolvimento de modelos de linguagem para extração de aspectos em língua portuguesa. 2023. 93 p. Dissertação (Mestrado em Engenharia de Sistemas e Automação)–Universidade Federal de Lavras, Lavras, 2023.
Resumo: The identification and extraction of aspects are essential in text analysis for discerning opinions and emotions. However, there is a gap in applying these techniques to Portuguese. This work aims to adapt approaches originally developed for English to this language in the TV and ReLi datasets. The goal of this work is to evaluate the application of language models for aspect extraction in Portuguese in the context of TV device reviews and literary reviews in the TV and ReLi datasets. To achieve this goal, models based on the BERT architecture were employed, both in the pre-trained form for general domains (BERTimbau) and for specific domains (BERTtv and BERTreli). Additionally, a double embedding technique was implemented, combining general and specific domain models. Large Language Models (LLMs) were also evaluated, including variants of GPT-3 via the OpenAI API and a variant of LLaMa, Cabrita, which is trained for the Portuguese language. To optimize hardware resource demand, efficient fine-tuning techniques such as LoRA (Low-Rank Adaptation) for BERTimbau and QLoRa (Quantized Low-Rank Adaptation) for Cabrita were applied. The results showed that the BERTimbau model adjusted with LoRA was superior in both datasets, achieving F1 scores of 0.846 for the TV dataset and 0.615 for ReLi. In contrast, the Cabrita model showed inferior performance, with less favorable results for both datasets, 0.68 for TV and 0.46 for ReLi. This study, therefore, offers a valuable contribution to research in aspect extraction in Portuguese, demonstrating the feasibility and effectiveness of adapting and optimizing techniques and models originally developed for other languages.
Descrição: Arquivo retido, a pedido do autor, até janeiro de 2025.
URI: http://repositorio.ufla.br/jspui/handle/1/58857
Aparece nas coleções:Engenharia de Sistemas e automação (Dissertações)

Arquivos associados a este item:
Não existem arquivos associados a este item.


Este item está licenciada sob uma Licença Creative Commons Creative Commons