Aprendizado de máquina para predição de brucelose bovina a partir de dados desbalanceados

Alves, Caio Donizetti Queiroz

Use este identificador para citar ou linkar para este item: http://repositorio.ufla.br/jspui/handle/1/59760

Título:	Aprendizado de máquina para predição de brucelose bovina a partir de dados desbalanceados
Título(s) alternativo(s):	Machine learning to predict bovine brucellosis from unbalanced data
Autores:	Ferreira, Danton Diego Rocha, Christiane Maria Barcellos Magalhães da Ferreira, Danton Diego Rocha, Christiane Maria Barcellos Magalhães da Barbosa, Bruno Henrique Groenner Dorneles, Elaine Maria Seles Tonelli, Adriano Olímpio
Palavras-chave:	Aprendizado de máquina Brucelose Análise de dados Machine learning Brucellosis Data analysis Balanceamento de classes Class balancing
Data do documento:	18-Dez-2024
Editor:	Universidade Federal de Lavras
Citação:	ALVES, Caio Donizetti Queiroz. Aprendizado de máquina para predição de brucelose bovina a partir de dados desbalanceados. 2024. 89 p. Dissertação (Engenharia de Sistemas e Automação) - Universidade Federal de Lavras, Lavras, 2024.
Resumo:	The expressiveness of Brazilian livestock farming is unquestionable. According to data from the United States Department of Agriculture (USDA), in 2021 Brazil was the world’s largest exporter of beef. Bovine brucellosis is one of the most worrying diseases for the sector. In Brazil, bovine brucellosis causes annual losses of around 448 million dollars. The concern of government agencies to control and eradicate this zoonosis is notable. However, several factors threaten the establishment of actions of the animal defense programs in force in Brazil, the main ones being: infected animals remain asymptomatic when infected, extensive Brazilian territo- rial area and large herds. Computational intelligence models such as Machine Learning (ML) can be great allies of health surveillance and epidemiological services. From both an agricul- tural and a One Health perspective, considering that brucellosis is a zoonosis, the development of ML approaches for predicting brucellosis has a very high potential benefit for society. Pre- dicting bovine brucellosis through questionnaires administered to livestock farmers, together with other diagnostic tools, can help in screening properties with different risks for the disease. These benefits are capable of enabling animal defense programs to carry out actions much more quickly, effectively and economically. The performance of ML models is directly linked to the quality and intrinsic characteristics of the database used during the model design or training phase. The imbalance of data available for the classes involved in a given problem is an exam- ple of a database characteristic that requires different treatment. Therefore, depending on the characteristics of the database, different techniques can be adopted to make better use of the in- formation available there. In this work, the performance of different ML approaches, combined with different class balancing techniques, in predicting brucellosis in cattle herds is compared and evaluated. To create the approaches, data from the survey of the official animal health de- fense service of the State of Minas Gerais (MAPA/IMA), from September 2010 to December 2012, were used. The database included records of 2185 herds, including , 2103 negative and 82 positive for brucellosis. The performances of the approaches were compared using several metrics recommended for problems involving databases with strong imbalance between classes, namely: Recall, Specificity, Precision (or Positive Predictive Value), F-measure, G-mean, and Index of Balanced Accuracy (IBA). The approaches that performed best were the One-Class Classification (OCC), which achieved G-mean and IBA values close to 0.60 and 0.36, respecti- vely. Initially, the great challenge of the problem in question was the imbalance between classes in the database used, however, the unsatisfactory results obtained by ML approaches instigated the execution of an exploratory analysis of the data. During the exploratory analysis of the da- tabase, several characteristics of the database highlighted a highly complex problem in terms of pattern recognition. The results obtained here show that the application of OCC classifiers are interesting options for dealing with databases that have significant imbalance between classes and high complexity in terms of pattern recognition.
Descrição:	Arquivo retido, a pedido do(a) autor(a), até novembro de 2025.
URI:	http://repositorio.ufla.br/jspui/handle/1/59760
Aparece nas coleções:	Engenharia de Sistemas e automação (Dissertações)

Arquivos associados a este item:

Não existem arquivos associados a este item.

Mostrar registro completo do item Recomendar este item Visualizar estatísticas

Este item está licenciada sob uma Licença Creative Commons