Evaluating a new auto-ML approach for sentiment analysis and intent recognition tasks

Oliveira, Douglas Nunes de; Utsch, Milo Noronha Rocha; Machado, Diogo Villela Pedro de Almeida; Pena, Nina Goulart; Oliveira, Ramon Gomes Durães de; Carvalho, Arthur Iperoyg Rodrigues; Merschmann, Luiz Henrique de Campos

Evaluating a new auto-ML approach for sentiment analysis and intent recognition tasks

dc.creator	Oliveira, Douglas Nunes de
dc.creator	Utsch, Milo Noronha Rocha
dc.creator	Machado, Diogo Villela Pedro de Almeida
dc.creator	Pena, Nina Goulart
dc.creator	Oliveira, Ramon Gomes Durães de
dc.creator	Carvalho, Arthur Iperoyg Rodrigues
dc.creator	Merschmann, Luiz Henrique de Campos
dc.date.accessioned	2023-11-24T15:48:53Z
dc.date.available	2023-11-24T15:48:53Z
dc.date.issued	2023
dc.description.abstract	Automated Machine Learning (AutoML) is a research area that aims to help humans solve Machine Learning (ML) problems by automatically discovering good ML pipelines (algorithms and their hyperparameters for every stage of a machine learning process) for a given dataset. Since we have a combinatorial optimization problem for which it is impossible to evaluate all possible pipelines, most AutoML systems use a Genetic Algorithm (GA) or Bayesian Optimization (BO) to find a good solution. These systems usually evaluate the performance of the pipelines using the K-fold cross-validation method, for which the more pipelines are evaluated, the higher the chance of finding an overfitted solution. To avoid the aforementioned issue, we propose a system named Auto-ML System for Text Classification (ASTeC), that uses the Bootstrap Bias Corrected CV (BBC-CV) method to evaluate the performance of the pipelines. More specifically, the proposed system combines GA, BO, and BBC-CV to find a good ML pipeline for the text classification task. We evaluated our approach by comparing it with state-of-the-art systems: in the the Sentiment Analysis (SA) task, we compared our approach to TPOT (Tree-based Pipeline Optimization Tool) and Google Cloud AutoML service, and for the Intent Recognition (IR) task, we compared with TPOT and MLJAR AutoML. Concerning the data, we analysed seven public datasets from the SA domain and sixteen from the IR domain. Four out of those sixteen are composed by written English text, while all of the others are in Brazilian Portuguese. Statistical tests show that, in 21 out of 23 datasets, our system's performance is equivalent to or better than the others.	pt_BR
dc.identifier.citation	OLIVEIRA, D. N. de et al. Evaluating a new auto-ML approach for sentiment analysis and intent recognition tasks. Journal on Interactive Systems, Porto Alegre, v. 14, n. 1, p. 92-105, 2023.	pt_BR
dc.identifier.uri	https://repositorio.ufla.br/handle/1/58594
dc.language	en_US	pt_BR
dc.publisher	Brazilian Computing Society	pt_BR
dc.rights	Attribution 4.0 International	*
dc.rights	Attribution 4.0 International
dc.rights	acesso aberto	pt_BR
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/
dc.source	Journal on Interactive Systems	pt_BR
dc.subject	Automated Machine Learning (AutoML)	pt_BR
dc.subject	Biascorrectioncross-validation	pt_BR
dc.subject	Genetic algorithm	pt_BR
dc.subject	Bayesian optimization	pt_BR
dc.subject	Intent recognition	pt_BR
dc.subject	Chatbot	pt_BR
dc.title	Evaluating a new auto-ML approach for sentiment analysis and intent recognition tasks	pt_BR
dc.type	Artigo	pt_BR

Arquivos

Pacote original

Agora exibindo 1 - 1 de 1

Nome:: 3161-Article Text-14257-1-10-20230509.pdf
Tamanho:: 284.19 KB
Formato:: Adobe Portable Document Format
Descrição:

Baixar

Licença do pacote

Agora exibindo 1 - 1 de 1

Nome:: license.txt
Tamanho:: 956 B
Formato:: Item-specific license agreed upon to submission
Descrição:

Baixar

Coleções

DAC - Artigos publicados em periódicos