Use este identificador para citar ou linkar para este item:
http://repositorio.ufla.br/jspui/handle/1/39129
Título: | Seleção de variáveis para regressão logística em um exemplo de segurança e frequência alimentar |
Título(s) alternativo(s): | Variables selection for logistic regression in an example of safety and food frequency |
Autores: | Oliveira, Izabela Regina Cardoso de Lima, Renato Ribeiro de Bueno Filho, Júlio Sílvio de Sousa Petrini, Juliana Barroso, Camilla Marques |
Palavras-chave: | LASSO CART Árvores de classificação Stepwise Least Absolute Shrinkage and Selection Operator (LASSO) Classification and Regression Trees (CART) Classification trees Regressão Stepwise |
Data do documento: | 4-Fev-2020 |
Editor: | Universidade Federal de Lavras |
Citação: | SANTOS, P. R. Seleção de variáveis para regressão logística em um exemplo de segurança e frequência alimentar. 2020. 59 p. Dissertação (Mestrado em Estatística e Experimentação Agropecuária )–Universidade Federal de Lavras, Lavras, 2020. |
Resumo: | Linear regression emerged in the nineteenth century and it is one of the most commonly used statistical techniques in applied research when the interest lies on explain a response, Y , based on one or more explanatory variables, X . However, when the response does not follow a normal distribution, generalized linear models may be more appropriate. An example which has broad application is the logistic model for binary responses. In regression analysis, when there are several explanatory variables, it is necessary to select those that would result in a useful and parsimonious model. One solution is the Lasso regularization method, where coefficient estimates shrink to zero, implying that only variables that significantly affect the variation in Y are considered in the model. However, as the number of explanatory variables and data complexity increase, alternatives have emerged, such as Machine Learning techniques. The aim of this study is to use Lasso and Classification Trees for variable selection in logistic models, using an example of food safety and frequency in children. Data were collected from 581 children attending Centros Municipais de Educação Infantil (Municipal Centers of Early Childhood Education), in Lavras, MG, Brazil. The 37 potential predictors of food frequency were reduced to 3 and 7 when Lasso and classification tree, respectively, were applied. For the response food security, the 19 predictors were reduced to 5 and 9 after applying Lasso and classification tree, respectively. The models obtained with the selected variables through both methods were reduced using stepwise. The chosen models for each response variable were compared by AIC (Akaike Information Criterion) and residual deviance. For food frequency, the model obtained from Lasso showed lower values of AIC and residual deviance (AIC = 107.95 and deviance = 101.95) than that obtained from the classification tree (AIC = 509, 68 and deviance = 489, 68). This pattern also occurred for food security. In this case, the AIC of the model considering Lasso was 273.20 and its deviance was 255.20, while for the classification tree the AIC was 307.37 and the residual deviance was 283.37. For this dataset, the models obtained using the variables selected by Lasso presented better results according to the statistical criteria. But classification trees can also be considered, since the selected variables have practical importance and they provide intuitive and easy-to-interpret graphical results. |
URI: | http://repositorio.ufla.br/jspui/handle/1/39129 |
Aparece nas coleções: | Estatística e Experimentação Agropecuária - Mestrado (Dissertações) |
Arquivos associados a este item:
Arquivo | Descrição | Tamanho | Formato | |
---|---|---|---|---|
DISSERTAÇÂO_Seleção de variáveis para regressão logística em um exemplo de segurança e frequência alimentar.pdf | 868,21 kB | Adobe PDF | Visualizar/Abrir |
Os itens no repositório estão protegidos por copyright, com todos os direitos reservados, salvo quando é indicado o contrário.