Exploring graph learning techniques for enhancing cross-domain applications in computer vision and natural language processing

Wentao Zhang
DOI: https://doi.org/10.54254/2755-2721/76/20240610
2024-07-16
Abstract:With the rapid development of artificial intelligence, graph learning technology has become a research hotspot because of its unique data processing ability. This paper discusses the application of graph learning technology in computer vision (CV) and natural language processing (NLP), especially, it is applied to image segmentation and recognition, visual relationship detection, dynamic scene understanding, semantic role labeling and knowledge map enhancement. The graph neural network (GNN) can effectively learn the deep features of image or text data by defining pixels or hyperpixels as nodes and constructing the edges between nodes according to the spatial proximity or similarity between pixels. This paper also discusses the application of graph learning technology in improving model interpretability, including feature relation visualization, error analysis and diagnosis, and model decision path interpretation. By comparing the application effects of different graph learning models and algorithms, this paper aims to provide reference and inspiration for the follow-up research and application.
What problem does this paper attempt to address?