Please use this identifier to cite or link to this item: http://repositorio.ufla.br/jspui/handle/1/49959
Title: Processamento de linguagem natural no segmento de e-commerce: uma aplicação few shot learning com redes neurais siamesas
Authors: Ferreira, Danton Diego
Barbosa, Bruno Henrique Groenner
Ferreira, Danton Diego
Vitor, Giovani Bernardes
Huallpa, Belisario Nina
Keywords: Mineração de dados
One shot learning
Redes neurais siamesas
Processamento de linguagem natural
Aprendizado de máquina
E-commerce
Data mining
Siamese neural networks
Natural language processing
Machine learning
Extração de características
Feature extraction
Issue Date: 18-May-2022
Publisher: Universidade Federal de Lavras
Citation: SILVA, F. C. e. Processamento de linguagem natural no segmento de e-commerce: uma aplicação few shot learning com redes neurais siamesas. 2022. 112 p. Dissertação (Mestrado em Engenharia de Sistemas e Automação) - Universidade Federal de Lavras, Lavras, 2022.
Abstract: The number of companies making their products available for purchase online has increased, causing new offers to appear all the time. However, there is no pattern between the description of products provided by sellers, which can lead to a product being placed in a different cate- gory from the one to which it belongs and generating a poor shopping experience. E-commerce companies can use the large volume of data generated in the various transactions carried out on the Internet to build user profiles and make personalized product recommendations. Therefore, solutions applying natural language processing have the potential to solve problems related to E-commerce and also to optimize a good part of the processes. The issue addressed in this project is the study and improvement of artificial intelligence systems for E-commerce. Uns- tructured data classification techniques were analyzed and developed, considering the problem faced in online commerce platforms, since new registered products can be misclassified, while their classes are still unrepresentative in the database. This is a situation where one/few-shot learning algorithms can be applied, in which a classifier must learn information relevant to the classification of samples using one or a few samples of a class during its training. The amount of efficient tools to deal with such a situation is limited, as conventional classification methods cannot learn and establish meaningful relationships from a few training data. In this work, it is proposed to use a classifier with Siamese neural networks to classify new classes in an E- commerce problem. Different topologies were tested for the internal network of the Siamese network, as well as different approaches for choosing the representative sample used as a re- ference for each class, being proposed the random choice, with centroid and with ensemble of representatives. The proposed classifier with representative choice made with the centroid calculation obtained 98% accuracy when dealing with a problem of 6 classes and less than 400 samples. For a larger database, with approximately 4000 samples and 452 classes, the model with a three-layer internal network structure using the DropOut technique in one of the layers and the representative being the calculated centroid the Siamese network obtained the best re- sult among the tested options, with 90.31% of correct answers, against 83.62% of the random representative sample and 81.81% using the K-Nearest Neighbors (KNN) algorithm. as future works, strategies can be studied to improve the performance of the model, such as the formation of training pairs that maximize the differences between classes, instead of randomly combining samples. Different feature extractors for data from online sales platforms can also be developed, since an extractor that delivers features with a smaller dimension contributes to the reduction of the classifier’s complexity, which can result in savings in server usage.
URI: http://repositorio.ufla.br/jspui/handle/1/49959
Appears in Collections:Engenharia de Sistemas e automação (Dissertações)



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.