Please use this identifier to cite or link to this item: http://repositorio.ufla.br/jspui/handle/1/58857
Title: Desenvolvimento de modelos de linguagem para extração de aspectos em língua portuguesa
Other Titles: Development of language models for aspect extraction in portuguese
Authors: Ferreira, Danton Diego
Ferreira, Danton Diego
Barbosa, Bruno Henrique Groenner
Pereira, Denilson Alves
Cardoso, Paula Christina Figueira
Vitor, Giovani Bernardes
Keywords: Processamento de linguagem natural
Extração de aspectos
BERT
Modelos de linguagem
Natural language processing
Aspect extraction
Bidirectional Encoder Representations from Transformers
Language models
Issue Date: 29-Jan-2024
Publisher: Universidade Federal de Lavras
Citation: FERREIRA NETO, J. C. Desenvolvimento de modelos de linguagem para extração de aspectos em língua portuguesa. 2023. 93 p. Dissertação (Mestrado em Engenharia de Sistemas e Automação)–Universidade Federal de Lavras, Lavras, 2023.
Abstract: The identification and extraction of aspects are essential in text analysis for discerning opinions and emotions. However, there is a gap in applying these techniques to Portuguese. This work aims to adapt approaches originally developed for English to this language in the TV and ReLi datasets. The goal of this work is to evaluate the application of language models for aspect extraction in Portuguese in the context of TV device reviews and literary reviews in the TV and ReLi datasets. To achieve this goal, models based on the BERT architecture were employed, both in the pre-trained form for general domains (BERTimbau) and for specific domains (BERTtv and BERTreli). Additionally, a double embedding technique was implemented, combining general and specific domain models. Large Language Models (LLMs) were also evaluated, including variants of GPT-3 via the OpenAI API and a variant of LLaMa, Cabrita, which is trained for the Portuguese language. To optimize hardware resource demand, efficient fine-tuning techniques such as LoRA (Low-Rank Adaptation) for BERTimbau and QLoRa (Quantized Low-Rank Adaptation) for Cabrita were applied. The results showed that the BERTimbau model adjusted with LoRA was superior in both datasets, achieving F1 scores of 0.846 for the TV dataset and 0.615 for ReLi. In contrast, the Cabrita model showed inferior performance, with less favorable results for both datasets, 0.68 for TV and 0.46 for ReLi. This study, therefore, offers a valuable contribution to research in aspect extraction in Portuguese, demonstrating the feasibility and effectiveness of adapting and optimizing techniques and models originally developed for other languages.
Description: Arquivo retido, a pedido do autor, até janeiro de 2025.
URI: http://repositorio.ufla.br/jspui/handle/1/58857
Appears in Collections:Engenharia de Sistemas e automação (Dissertações)

Files in This Item:
There are no files associated with this item.


This item is licensed under a Creative Commons License Creative Commons