Abstract:Dependency trees parsed from natural language sentences have been proven to be beneficial for the relation extraction task by deep neural networks. However, effectively and efficiently utilizing the structural information of dependency trees remains a challenging research problem for neural networks. Existing methods either struggle to facilitate interaction between nodes at different levels that are distant in the dependency tree or face limitations due to computational inefficiency. In this article, we propose a novel model, named Self-Attention over Tree for Relation Extraction (SATRE), designed for tree-structured data and adept at fully utilizing the structural information implied in the subtrees of dependency trees. SATRE enables interactions among nodes in each subtree, even when they are on widely separated layers. As a result, nodes are learned across multiple subtrees, making SATRE data-efficient; that is, SATRE remains efficient even when data is scarce. Furthermore, SATRE is implemented using a parallel mechanism. Specifically, SATRE parallelly computes node representations of a tree in two levels: across all subtrees of the tree and across all nodes within a subtree. Empirically, SATRE consistently outperforms the compared methods on two real-world benchmarks: the TACRED and SemEval-2010 Task 8 datasets, which shows the effectiveness of SATRE. Meanwhile, extensive experiments indicate SATRE's data-efficiency in utilizing the training data and its computational efficiency in running time.

Do Attention Heads in BERT Track Syntactic Dependencies?

Attention Can Reflect Syntactic Structure (If You Let It)

What Does BERT Look At? An Analysis of BERT's Attention

GATology for Linguistics: What Syntactic Dependencies It Knows

Do Neural Language Models Show Preferences for Syntactic Formalisms?

Unsupervised Dependency Graph Network

Tree Transformer: Integrating Tree Structures into Self-Attention.

Rethinking Self-Attention: Towards Interpretability in Neural Parsing

Probing for Bridging Inference in Transformer Language Models

Syntax-aware Neural Machine Translation Directed by Syntactic Dependency Degree

Identifying Semantic Induction Heads to Understand In-Context Learning

Structural analysis of an all-purpose question answering model

Syntax-based Attention Model for Natural Language Inference.

Attention Is (not) All You Need for Commonsense Reasoning

Are Sixteen Heads Really Better than One?

Naturalness of Attention: Revisiting Attention in Code Language Models

Self-Attention over Tree for Relation Extraction with Data-Efficiency and Computational Efficiency

What does Chinese BERT learn about syntactic knowledge?

Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models

Probing self-attention in self-supervised speech models for cross-linguistic differences