Abstract:Knowledge graphs capture entities and relations from long documents and can facilitate reasoning in many downstream applications. Extracting compact knowledge graphs containing only salient entities and relations is important but challenging for understanding and summarizing long documents. We introduce a new text-to-graph task of predicting summarized knowledge graphs from long documents. We develop a dataset of 200k document/graph pairs using automatic and human annotations. We also develop strong baselines for this task based on graph learning and text summarization, and provide quantitative and qualitative studies of their effect.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to extract a compact knowledge graph from long documents (such as scientific papers) to represent its most important information. Specifically, the goals of the paper are: 1. **Identify key entities and relationships**: Find the most important and relevant entities from long documents and the relationships between them, so as to construct a compact knowledge graph that can reflect the core idea of the document. 2. **Improve the ability to understand and summarize long documents**: By extracting these key pieces of information, help to better understand the content of long documents and generate concise summaries. 3. **Meet the challenges of large - scale data processing**: When dealing with long and dense documents such as scientific papers, traditional information extraction methods may extract hundreds or thousands of entities and relationships, which makes it a new challenge to determine which are the most important and representative entities and relationships. The paper proposes a new text - to - graph task, aiming to predict the summary knowledge graph extracted from long documents. To this end, the author has developed a data set containing 200,000 document/graph pairs, and developed powerful baseline models based on graph learning and text summarization techniques. In addition, quantitative and qualitative studies have been carried out to evaluate the effectiveness of these models. ### Formula and symbol description - \(D\) represents the input document. - \(T_v\) represents the set of predefined entity types. - \(T_R\) represents the set of predefined relationship types. - \(G=(V, E)\) represents the predicted summary knowledge graph, where: - \(V\) is the set of entity nodes, and each \(v_i\in V\) represents an important entity with an entity type \(t_i\in T_v\). - \(E\) is the set of edges, and each edge \((v_i, v_j, r_{ij}^k)\in E\) represents an important relationship from \(v_i\) to \(v_j\) with a relationship type \(r_{ij}^k\in T_R\). ### Main contributions 1. **Introduce a new task**: Propose a new task of extracting a summary knowledge graph from long documents. 2. **Construct a large - scale data set**: Develop a data set containing 200,000 documents and their corresponding knowledge graphs. 3. **Develop baseline models**: Develop two baseline models based on text summarization and graph learning techniques, and evaluate their effectiveness. 4. **Evaluation metrics**: Design metrics for evaluating entity salience, relationship salience, and entity repetition rate to ensure the quality of model output. Through these efforts, the paper hopes to promote the research of future models, enabling them to better capture complex text relationships and be applied to a variety of downstream tasks.

Extracting Summary Knowledge Graphs from Long Documents

A Survey on Extractive Knowledge Graph Summarization: Applications, Approaches, Evaluation, and Future Directions

KATSum: Knowledge-aware Abstractive Text Summarization

Abstractive summarization incorporating graph knowledge

Entity Summarization in Knowledge Graphs: Algorithms, Evaluation, and Applications

Graph Embedding-Based Domain-Specific Knowledge Graph Expansion Using Research Literature Summary

Improving Long Text Understanding with Knowledge Distilled from Summarization Model

Leveraging Graph to Improve Abstractive Multi-Document Summarization.

Optimizing Model Parameter for Entity Summarization Across Knowledge Graphs

An Integrated Graph Model For Document Summarization

Neural Entity Summarization with Joint Encoding and Weak Supervision

Knowledge Graph Extraction from Videos

GraphLSS: Integrating Lexical, Structural, and Semantic Features for Long Document Extractive Summarization

Grapharizer: A Graph-Based Technique for Extractive Multi-Document Summarization

iSummary: Workload-based, Personalized Summaries for Knowledge Graphs

Text summarization based on semantic graphs: an abstract meaning representation graph-to-text deep learning approach

Small-world networks for summarization of biomedical articles

Image-Collection Summarization Using Scene-Graph Generation With External Knowledge

ENT-DESC: Entity Description Generation by Exploring Knowledge Graph

GoSum: Extractive Summarization of Long Documents by Reinforcement Learning and Graph Organized discourse state

A Multi-Granularity Heterogeneous Graph for Extractive Text Summarization