Please use this identifier to cite or link to this item:
http://repositorio.ufla.br/jspui/handle/1/58857
Title: | Desenvolvimento de modelos de linguagem para extração de aspectos em língua portuguesa |
Other Titles: | Development of language models for aspect extraction in portuguese |
Authors: | Ferreira, Danton Diego Ferreira, Danton Diego Barbosa, Bruno Henrique Groenner Pereira, Denilson Alves Cardoso, Paula Christina Figueira Vitor, Giovani Bernardes |
Keywords: | Processamento de linguagem natural Extração de aspectos BERT Modelos de linguagem Natural language processing Aspect extraction Bidirectional Encoder Representations from Transformers Language models |
Issue Date: | 29-Jan-2024 |
Publisher: | Universidade Federal de Lavras |
Citation: | FERREIRA NETO, J. C. Desenvolvimento de modelos de linguagem para extração de aspectos em língua portuguesa. 2023. 93 p. Dissertação (Mestrado em Engenharia de Sistemas e Automação)–Universidade Federal de Lavras, Lavras, 2023. |
Abstract: | The identification and extraction of aspects are essential in text analysis for discerning opinions and emotions. However, there is a gap in applying these techniques to Portuguese. This work aims to adapt approaches originally developed for English to this language in the TV and ReLi datasets. The goal of this work is to evaluate the application of language models for aspect extraction in Portuguese in the context of TV device reviews and literary reviews in the TV and ReLi datasets. To achieve this goal, models based on the BERT architecture were employed, both in the pre-trained form for general domains (BERTimbau) and for specific domains (BERTtv and BERTreli). Additionally, a double embedding technique was implemented, combining general and specific domain models. Large Language Models (LLMs) were also evaluated, including variants of GPT-3 via the OpenAI API and a variant of LLaMa, Cabrita, which is trained for the Portuguese language. To optimize hardware resource demand, efficient fine-tuning techniques such as LoRA (Low-Rank Adaptation) for BERTimbau and QLoRa (Quantized Low-Rank Adaptation) for Cabrita were applied. The results showed that the BERTimbau model adjusted with LoRA was superior in both datasets, achieving F1 scores of 0.846 for the TV dataset and 0.615 for ReLi. In contrast, the Cabrita model showed inferior performance, with less favorable results for both datasets, 0.68 for TV and 0.46 for ReLi. This study, therefore, offers a valuable contribution to research in aspect extraction in Portuguese, demonstrating the feasibility and effectiveness of adapting and optimizing techniques and models originally developed for other languages. |
Description: | Arquivo retido, a pedido do autor, até janeiro de 2025. |
URI: | http://repositorio.ufla.br/jspui/handle/1/58857 |
Appears in Collections: | Engenharia de Sistemas e automação (Dissertações) |
Files in This Item:
There are no files associated with this item.
This item is licensed under a Creative Commons License