Multi-input trademark element recognition with transformer

Linqi Liu,Xiuhui Wang
DOI: https://doi.org/10.1007/s11042-024-18678-y
IF: 2.577
2024-02-29
Multimedia Tools and Applications
Abstract:The trademark element recogniton is a crucial task in applications such as trademark brand evaluation and trademark infringement identification. In recent years, although modeling technology has made significant progress, small objects, similar objects, and objects with high conditional probability continue to be unable to be solved, due to the limitations of convolutional kernels. Based on semantic-aware region search and label dependency modeling, we propose a multi-input recognition framework for trademark elements (Mi-Tr) based on Transformer, which learns the complex dependencies between visual features and labels them through feature extraction using different convolutional networks and Transformer encoding. The proposed approach includes two visual feature-embedding modules that use modified VGG16 and ResNet101 as feature extractors to obtain feature information of trademark images in different dimensions. Simultaneously, the category labels are input into the transformer by embedding, using the order invariance of the transformer, thus, it is better to learn all types of dependencies between all features and labels. Additionally, the number of layers of the transformer and number of heads of the multiheaded attention were modified to find parameters that better match image features and label information. The experimental results on two datasets, METU and Logotypes of Different Companies, demonstrate that the classifier developed by our model performs significantly better in the multi-input classification of trademark image elements.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?