Use este identificador para citar ou linkar para este item: http://repositorio.ufla.br/jspui/handle/1/12853
Título: Uma abordagem incremental para resolução de entidades descritas por dados textuais curtos
Título(s) alternativo(s): An incremental entity resolution approach for short textual data
Autores: Pereira, Denilson Alves
Pereira, Denilson Alves
Pereira Júnior, Álvaro Rodrigues
Rosa, Thierson Couto
Palavras-chave: Resolução de entidades
Classificação associativa
Aprendizagem incremental
Entity resolution
Associative classification
Incremental learning
Data do documento: 8-Mai-2017
Editor: Universidade Federal de Lavras
Citação: SILVA, J. A. da. Uma abordagem incremental para resolução de entidades descritas por dados textuais curtos. 2017. 116 p. Dissertação (Mestrado em Ciência da Computação)-Universidade Federal de Lavras, Lavras, 2017.
Resumo: Several Web applications maintain data repositories containing references to thousands of realworld entities originating from multiple sources, and they continually receive new data. Identifying the distinct entities and associating the correct references to each one is a problem known as entity resolution. The challenge is to solve the problem incrementally, as the data arrive, especially when those data are described by a single textual attribute. In this work, we propose a approach for incremental entity resolution. Unlike traditional approaches, the method we implemented, called AssocIER, uses an ensemble of multiclass classifiers with self-training and detection of novel classes to incrementally group entity references. Self-training allows the learning model to be automatically updated during the prediction phase, and the novel class detection mechanism allows the identification of records of unknown classes in the training time. Our main classifier is based on a restricted case of association rules, which can be implemented efficiently. We evaluated our method in various real-world datasets and scenarios, comparing it with a traditional entity resolution approach. The results show that AssocIER is effective and efficient to solve unstructured data in collections with a very large number of entities and features, and is able to detect hundreds of novel classes. We found that AssocIER can greatly improve the performance of resolving product data, which is a weakness of the baseline, achieving gains of 149% in effectiveness and being up to 385 times faster in the prediction phase. The results also show that it is important to incorporate new data into the learning model, especially for datasets with fewer records per class. Furthermore, our method behaves well in scenarios of scarce availability of examples for training, being able to run even with no training data.
URI: http://repositorio.ufla.br/jspui/handle/1/12853
Aparece nas coleções:Ciência da Computação - Mestrado (Dissertações)

Arquivos associados a este item:
Arquivo Descrição TamanhoFormato 
DISSERTAÇÃO_Uma abordagem incremental para resolução de entidades descritas por dados textuais curtos.pdf1,69 MBAdobe PDFVisualizar/Abrir


Os itens no repositório estão protegidos por copyright, com todos os direitos reservados, salvo quando é indicado o contrário.