Please use this identifier to cite or link to this item:
http://repositorio.ufla.br/jspui/handle/1/10545
Title: | Enriquecendo um arquivo de autoridade de veículos de publicação com informações extraídas da Web |
Other Titles: | Enriching a publication venue authority file with Web extracted informations |
Authors: | Pereira, Denilson Alves Ferreira, Anderson Almeida Assis, Guilherme Tavares de |
Keywords: | Arquivo de autoridade Veículo de publicação Extração de informação Classificação de documentos Máquina de busca Authority file Publication venue Information extraction Document classification Search engine |
Issue Date: | 28-Oct-2015 |
Publisher: | Universidade Federal de Lavras |
Citation: | JESUS, H. A. de. Enriquecendo um arquivo de autoridade de veículos de publicação com informações extraídas da Web. 2015. 78 p. Dissertação (Mestrado em Ciência da Computação)-Universidade Federal de Lavras, Lavras, 2015. |
Abstract: | Authority files maintain entity registries and are generally used by digital libraries for elaborating disambiguation tools for author names or titles of publishing venues. An authority file with detailed and consistent information on publication venues allows the improvement of such tools. This work has the objective of enriching an authority file of Computer Science publication venue. The proposal is of obtaining additional information in order to complement this already existing authority archive, by automatically extracting information from web pages, obtained by means of consultations to a research engine. The approach contemplates the steps for submitting consultations, classifying documents and extracting information of relevant documents. The classification of the pages is an important task in this work. Two approaches were implemented and experimentally evaluated: classification based only on content, and classification based on gender and content. The first obtained the best results for page conference. From the relevant pages, we extracted data such as year, edition number and date, in addition to name and abbreviation, seeking an unknown variant in written form. The experiments conducted demonstrate good results in the collection of conference information, allowing us to trace the record of performing the same, with data such as edition year and name change. |
URI: | http://repositorio.ufla.br/jspui/handle/1/10545 |
Appears in Collections: | Ciência da Computação - Mestrado (Dissertações) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
DISSERTAÇÃO_Enriquecendo um arquivo de autoridade de veículos de publicação com informações extraídas da Web.pdf | 1,33 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.