Please use this identifier to cite or link to this item: http://repositorio.ufla.br/jspui/handle/1/59760
Title: Aprendizado de máquina para predição de brucelose bovina a partir de dados desbalanceados
Other Titles: Machine learning to predict bovine brucellosis from unbalanced data
Authors: Ferreira, Danton Diego
Rocha, Christiane Maria Barcellos Magalhães da
Ferreira, Danton Diego
Rocha, Christiane Maria Barcellos Magalhães da
Barbosa, Bruno Henrique Groenner
Dorneles, Elaine Maria Seles
Tonelli, Adriano Olímpio
Keywords: Aprendizado de máquina
Brucelose
Análise de dados
Machine learning
Brucellosis
Data analysis
Balanceamento de classes
Class balancing
Issue Date: 18-Dec-2024
Publisher: Universidade Federal de Lavras
Citation: ALVES, Caio Donizetti Queiroz. Aprendizado de máquina para predição de brucelose bovina a partir de dados desbalanceados. 2024. 89 p. Dissertação (Engenharia de Sistemas e Automação) - Universidade Federal de Lavras, Lavras, 2024.
Abstract: The expressiveness of Brazilian livestock farming is unquestionable. According to data from the United States Department of Agriculture (USDA), in 2021 Brazil was the world’s largest exporter of beef. Bovine brucellosis is one of the most worrying diseases for the sector. In Brazil, bovine brucellosis causes annual losses of around 448 million dollars. The concern of government agencies to control and eradicate this zoonosis is notable. However, several factors threaten the establishment of actions of the animal defense programs in force in Brazil, the main ones being: infected animals remain asymptomatic when infected, extensive Brazilian territo- rial area and large herds. Computational intelligence models such as Machine Learning (ML) can be great allies of health surveillance and epidemiological services. From both an agricul- tural and a One Health perspective, considering that brucellosis is a zoonosis, the development of ML approaches for predicting brucellosis has a very high potential benefit for society. Pre- dicting bovine brucellosis through questionnaires administered to livestock farmers, together with other diagnostic tools, can help in screening properties with different risks for the disease. These benefits are capable of enabling animal defense programs to carry out actions much more quickly, effectively and economically. The performance of ML models is directly linked to the quality and intrinsic characteristics of the database used during the model design or training phase. The imbalance of data available for the classes involved in a given problem is an exam- ple of a database characteristic that requires different treatment. Therefore, depending on the characteristics of the database, different techniques can be adopted to make better use of the in- formation available there. In this work, the performance of different ML approaches, combined with different class balancing techniques, in predicting brucellosis in cattle herds is compared and evaluated. To create the approaches, data from the survey of the official animal health de- fense service of the State of Minas Gerais (MAPA/IMA), from September 2010 to December 2012, were used. The database included records of 2185 herds, including , 2103 negative and 82 positive for brucellosis. The performances of the approaches were compared using several metrics recommended for problems involving databases with strong imbalance between classes, namely: Recall, Specificity, Precision (or Positive Predictive Value), F-measure, G-mean, and Index of Balanced Accuracy (IBA). The approaches that performed best were the One-Class Classification (OCC), which achieved G-mean and IBA values close to 0.60 and 0.36, respecti- vely. Initially, the great challenge of the problem in question was the imbalance between classes in the database used, however, the unsatisfactory results obtained by ML approaches instigated the execution of an exploratory analysis of the data. During the exploratory analysis of the da- tabase, several characteristics of the database highlighted a highly complex problem in terms of pattern recognition. The results obtained here show that the application of OCC classifiers are interesting options for dealing with databases that have significant imbalance between classes and high complexity in terms of pattern recognition.
Description: Arquivo retido, a pedido do(a) autor(a), até novembro de 2025.
URI: http://repositorio.ufla.br/jspui/handle/1/59760
Appears in Collections:Engenharia de Sistemas e automação (Dissertações)

Files in This Item:
There are no files associated with this item.


This item is licensed under a Creative Commons License Creative Commons