Efficient set similarity join on multi-attribute data using lightweight filters

Ribeiro, Leonardo Andrade; Borges, Felipe Ferreira; Oliveira, Diego

Use este identificador para citar ou linkar para este item: http://repositorio.ufla.br/jspui/handle/1/49940

Título:	Efficient set similarity join on multi-attribute data using lightweight filters
Palavras-chave:	Advanced query processing Data cleaning Data integration Multi-attribute data Similarity join
Data do documento:	Set-2021
Editor:	Brazilian Computer Society
Citação:	RIBEIRO, L. A.; BORGES, F. F.; OLIVEIRA, D. Efficient set similarity join on multi-attribute data using lightweight filters. Journal of Information and Data Management, [S.l.], v. 12, n. 3, p. 226-241, Sept. 2021.
Resumo:	We consider the problem of efficiently answering set similarity joins on multi-attribute data. Traditionalset similarity join algorithms assume string data represented by a single set and, thus, miss the opportunity to exploitpredicates over multiple attributes to reduce the number of similarity computations. In this article, we present a frame-work to enhance existing algorithms with additional filters for dealing with multi-attribute data. We then instantiatethis framework with a lightweight filtering technique based on a simple, yet effective data structure, for which exact andprobabilistic implementations are evaluated. In this context, we devise a cost model to identify the best attribute order-ing to reduce processing time. Moreover, alternative approaches are also investigated and a new algorithm combiningkey ideas from previous work is introduced. Finally, we present a thorough experimental evaluation, which demonstratesthat our main proposal is efficient and significantly outperforms competing algorithms.
URI:	https://sol.sbc.org.br/journals/index.php/jidm/article/view/1969 http://repositorio.ufla.br/jspui/handle/1/49940
Aparece nas coleções:	DCC - Artigos publicados em periódicos

Arquivos associados a este item:

Não existem arquivos associados a este item.

Mostrar registro completo do item Recomendar este item Visualizar estatísticas

Ferramentas do administrador