Please use this identifier to cite or link to this item: http://repositorio.ufla.br/jspui/handle/1/30517
Title: Data classification with binary response through the boosting algorithm and logistic regression
Keywords: Boosting algorithm
Data classification
Logistic regression
Information criteria
Akaike information criterion (AIC)
Bayesian information criterion (BIC)
Selection of models
Monte Carlo Simulation
Issue Date: Mar-2017
Publisher: Elsevier
Citation: MENEZES, F. S. de et al. Data classification with binary response through the boosting algorithm and logistic regression. Expert Systems with Applications, [S.l.], v. 69, p. 62-73, Mar. 2017.
Abstract: The task of classifying is natural to humans, but there are situations in which a person is not best suited to perform this function, which creates the need for automatic methods of classification. Traditional methods, such as logistic regression, are commonly used in this type of situation, but they lack robustness and accuracy. These methods do not not work very well when the data or when there is noise in the data, situations that are common in expert and intelligent systems. Due to the importance and the increasing complexity of problems of this type, there is a need for methods that provide greater accuracy and interpretability of the results. Among these methods, is Boosting, which operates sequentially by applying a classification algorithm to reweighted versions of the training data set. It was recently shown that Boosting may also be viewed as a method for functional estimation. The purpose of the present study was to compare the logistic regressions estimated by the maximum likelihood model (LRMML) and the logistic regression model estimated using the Boosting algorithm, specifically the Binomial Boosting algorithm (LRMBB), and to select the model with the better fit and discrimination capacity in the situation of presence(absence) of a given property (in this case, binary classification). To illustrate this situation, the example used was to classify the presence (absence) of coronary heart disease (CHD) as a function of various biological variables collected from patients. It is shown in the simulations results based on the strength of the indications that the LRMBB model is more appropriate than the LRMML model for the adjustment of data sets with several covariables and noisy data. The following sections report lower values of the information criteria AIC and BIC for the LRMBB model and that the Hosmer–Lemeshow test exhibits no evidence of a bad fit for the LRMBB model. The LRMBB model also presented a higher AUC, sensitivity, specificity and accuracy and lower values of false positives rates and false negatives rates, making it a model with better discrimination power compared to the LRMML model. Based on these results, the logistic model adjusted via the Binomial Boosting algorithm (LRMBB model) is better suited to describe the problem of binary response, because it provides more accurate information regarding the problem considered.
URI: https://www.sciencedirect.com/science/article/pii/S0957417416304092
http://repositorio.ufla.br/jspui/handle/1/30517
Appears in Collections:DES - Artigos publicados em periódicos
DFI - Artigos publicados em periódicos

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.