Please use this identifier to cite or link to this item: http://repositorio.ufla.br/jspui/handle/1/39264
Title: Modelos evolutivos baseados em grânulos e nuvens de dados para classificação online de spam
Authors: Leite, Daniel Furtado
Gouvêa Junior, Maury Meirelles
Rodríguez, Demóstenes Zegarra
Keywords: Detecção de spam
Sistemas inteligentes evolutivos
Sistemas Fuzzy
Agrupamento incremental
Nuvem de dados
Spam detection
Evolving intelligent methods
Fuzzy systems
Incremental clustering
Data clouds
Issue Date: 11-Feb-2020
Publisher: Universidade Federal de Lavras
Citation: POUÇAS, R. de P. Modelos evolutivos baseados em grânulos e nuvens de dados para classificação online de spam. 2020. 101 p. Dissertação (Mestrado em Engenharia de Sistemas e Automação)-Universidade Federal de Lavras, Lavras, 2017.
Abstract: Sending and receiving e-mails has become a concern since people use such tool to disseminate malicious code aiming to damage a computer system or steal information. The act of sending a message without user permission is called spam. There exist several techniques to disseminate spams. They are based on the content of the message or in some weakness of the classification system, which intercepts messages. Classification systems able to self-adapt over time are rare. Adaptation is needed because spams vary over time as consequence of the application of several message-masking techniques. Moreover, classification models that handle large volumes of data using low computational resource are interesting. Evolving Intelligent Systems are able to adapt their parameters and structure in view of the changes in a stream of data extracted from e-mails. This work uses TEDA (Typicality and Eccentricity based Data Analytics) and FBeM (Fuzzy Set-Based Evolving Modeling) for online unsupervised classification of spams. TEDA is based on the concepts of data clouds, eccentricity and typicality. The idea is that TEDA clouds do not have a specific geometric shape such as conventional clusters. FBeM uses fuzzy granular objects to summarize information extracted from a data stream. FBeM is based on the concept of coverage (granulation) of the data space. Its rules are linguistically interpretable; they are useful to help decision making. TEDA and FBeM are compared in the sense of classification error, processing speed and parsimony. For dimensionality reduction, ACO (Ant Colony Optimization) is employed. ACO is inspired on intelligent behavior of ants. The feature selection problem is represented as a graph, where the optimum path minimizes an objective function and suggests the most discriminate features for spam classification. A dataset containing 25745 samples, being 7830 spams and 17915 legitimate e-mails, was created. 711 features extracted from an e-mail server describe each sample.
URI: http://repositorio.ufla.br/jspui/handle/1/39264
Appears in Collections:Engenharia de Sistemas e automação (Dissertações)



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.