Data classification with binary response through the boosting algorithm and logistic regression

Menezes, Fortunato S. de; Liska, Gilberto R.; Cirillo, Marcelo A.; Vivanco, Mário J. F.

Use este identificador para citar ou linkar para este item: http://repositorio.ufla.br/jspui/handle/1/30517

Título:	Data classification with binary response through the boosting algorithm and logistic regression
Palavras-chave:	Boosting algorithm Data classification Logistic regression Information criteria Akaike information criterion (AIC) Bayesian information criterion (BIC) Selection of models Monte Carlo Simulation
Data do documento:	Mar-2017
Editor:	Elsevier
Citação:	MENEZES, F. S. de et al. Data classification with binary response through the boosting algorithm and logistic regression. Expert Systems with Applications, [S.l.], v. 69, p. 62-73, Mar. 2017.
Resumo:	The task of classifying is natural to humans, but there are situations in which a person is not best suited to perform this function, which creates the need for automatic methods of classification. Traditional methods, such as logistic regression, are commonly used in this type of situation, but they lack robustness and accuracy. These methods do not not work very well when the data or when there is noise in the data, situations that are common in expert and intelligent systems. Due to the importance and the increasing complexity of problems of this type, there is a need for methods that provide greater accuracy and interpretability of the results. Among these methods, is Boosting, which operates sequentially by applying a classification algorithm to reweighted versions of the training data set. It was recently shown that Boosting may also be viewed as a method for functional estimation. The purpose of the present study was to compare the logistic regressions estimated by the maximum likelihood model (LRMML) and the logistic regression model estimated using the Boosting algorithm, specifically the Binomial Boosting algorithm (LRMBB), and to select the model with the better fit and discrimination capacity in the situation of presence(absence) of a given property (in this case, binary classification). To illustrate this situation, the example used was to classify the presence (absence) of coronary heart disease (CHD) as a function of various biological variables collected from patients. It is shown in the simulations results based on the strength of the indications that the LRMBB model is more appropriate than the LRMML model for the adjustment of data sets with several covariables and noisy data. The following sections report lower values of the information criteria AIC and BIC for the LRMBB model and that the Hosmer–Lemeshow test exhibits no evidence of a bad fit for the LRMBB model. The LRMBB model also presented a higher AUC, sensitivity, specificity and accuracy and lower values of false positives rates and false negatives rates, making it a model with better discrimination power compared to the LRMML model. Based on these results, the logistic model adjusted via the Binomial Boosting algorithm (LRMBB model) is better suited to describe the problem of binary response, because it provides more accurate information regarding the problem considered.
URI:	https://www.sciencedirect.com/science/article/pii/S0957417416304092 http://repositorio.ufla.br/jspui/handle/1/30517
Aparece nas coleções:	DES - Artigos publicados em periódicos DFI - Artigos publicados em periódicos

Arquivos associados a este item:

Não existem arquivos associados a este item.

Mostrar registro completo do item Recomendar este item Visualizar estatísticas