Tutorial at LREC 2020 Graph-Based Meaning Representations: Design and Processing
Alexander Koller,Stephan Oepen,Weiwei Sun
2019-01-01
Abstract:This tutorial is on representing and processing sentence meaning in the form of labeled directed graphs. The tutorial will (a) briefly review relevant background in formal and linguistic semantics; (b) semi-formally define a unified abstract view on different flavors of semantic graphs and associated terminology; (c) survey common frameworks for graph-based meaning representation and available graph banks; and (d) offer a technical overview of a representative selection of different parsing approaches. 1 Tutorial Content and Relevance All things semantic have been receiving heightened attention in recent years. Despite remarkable advances in vector-based (continuous, dense, and distributed) encodings of meaning, ‘classic’ (hierarchically structured and discrete) semantic representations continue to play an important role in ‘making sense’ of natural language. While parsing has long been dominated by tree-structured target representations, there is now growing interest in general graphs as more expressive and arguably more adequate target structures for sentence-level grammatical analysis beyond surface syntax and in particular for the representation of semantic structure. Today, the landscape of meaning representation approaches, annotated graph banks, and parsing techniques into these structures is complex and diverse. Graph-based semantic parsing has been a task in almost every Semantic Evaluation (SemEval) exercise since 2014. These shared tasks were based on a variety of different corpora with graph-based meaning annotations (graph banks), which differ both in their formal properties and in the facets of meaning they aim to represent. The goal of this tutorial is to clarify this landscape for our research community by providing a unifying view on these graph banks and their associated parsing problems, while working out similarities and differences between common frameworks and techniques. Based on common-sense linguistic and formal dimensions established in its first part, the tutorial will provide a coherent, systematized overview of this field. Participants will be enabled to identify genuine content differences between frameworks as well as to tease apart more superficial variation, for example in terminology or packaging. Furthermore, major current processing techniques for semantic graphs will be reviewed against a highlevel inventory of families of approaches. This part of the tutorial will emphasize reflections on codependencies with specific graph flavors or frameworks, on worst-case and typical time and space complexity, as well as on what guarantees (if any) are obtained on the wellformedness and correctness of output structures. Kate and Wong (2010) suggest a definition of semantic parsing as “the task of mapping natural language sentences into complete formal meaning representations which a computer can execute for some domain-specific application.” This view brings along a tacit expectation to map (more or less) directly from a linguistic surface form to an actionable encoding of its intended meaning, e.g. in a database query or even programming language. In this tutorial, we embrace a broader perspective on semantic parsing as it has come to be viewed commonly in recent years. We will review graph-based meaning representations that aim to be applicationand domain-independent, i.e. seek to provide a reusable intermediate layer of interpretation that captures, in suitably abstract form, relevant constraints that the linguistic signal imposes on interpretation. Tutorial slides and additional materials are available at the following address: https://github.com/cfmrp/tutorial 2 Semantic Graph Banks In the first part of the tutorial, we will give a systematic overview of the available semantic graph banks. On the one hand, we will distinguish graph banks with respect to the facets of natural language meaning they aim to represent. For instance, some graph banks focus on predicate–argument structure, perhaps with some extensions for polarity or tense, whereas others capture (some) scopal phenomena. Furthermore, while the graphs in most graph banks do not have a precisely defined model theory in the sense of classical linguistic semantics, there are still underlying intuitions about what the nodes of the graphs mean (individual entities and eventualities in the world vs. more abstract objects to which statements about scope and presupposition can attach). We will discuss the different intuitions that underly different graph banks. On the other hand, we will follow Kuhlmann and Oepen (2016) in classifying graph banks with respect to the relationship they assume between the tokens of the sentence and the nodes of the graph (called anchoring of graph fragments onto input sub-strings). We will distinguish three flavors of semantic graphs, which by degree of anchoring we will call type (0) to type (2). While we use ‘flavor’ to refer to formally defined sub-classes of semantic graphs, we will reserve the term ‘framework’ for a specific linguistic approach to graph-based meaning representation (typically cast in a particular graph flavor, of course). Type (0) The strongest form of anchoring is obtained in bi-lexical dependency graphs, where graph nodes injectively correspond to surface lexical units (tokens). In such graphs, each node is directly linked to a specific token (conversely, there may be semantically empty tokens), and the nodes inherit the linear order of their corresponding tokens. This flavor of semantic graphs was popularized in part through a series of Semantic Dependency Parsing (SDP) tasks at the SemEval exercises in 2014–16 (Oepen et al., 2014, 2015; Che et al., 2016). Prominent linguistic frameworks instantiating this graph flavor include CCG word–word dependencies (CCD; Hockenmaier and Steedman, 2007), Enju Predicate– Argument Structures (PAS; Miyao and Tsujii, 2008), DELPH-IN MRS Bi-Lexical Dependencies (DM; Ivanova et al., 2012) and Prague Semantic Dependencies (PSD; a simplification of the tectogrammatical structures of Hajič et al., 2012). Type (1) A more general form of anchored semantic graphs is characterized by relaxing the correspondence relations between nodes and tokens, while still explicitly annotating the correspondence between nodes and parts of the sentence. Some graph banks of this flavor align nodes with arbitrary parts of the sentence, including subtoken or multi-token sequences, which affords more flexibility in the representation of meaning contributed by, for example, (derivational) affixes or phrasal constructions. Some further allow multiple nodes to correspond to overlapping spans, enabling lexical decomposition (e.g. of causatives or comparatives). Frameworks instantiating this flavor of semantic graphs include Universal Conceptual Cognitive Annotation (UCCA; Abend and Rappoport, 2013; featured in a SemEval 2019 task) and two variants of ‘reducing’ the underspecified logical forms of Flickinger (2000) and Copestake et al. (2005) into directed graphs, viz. Elementary Dependency Structures (EDS; Oepen and Lønning, 2006) and Dependency Minimal Recursion Semantics (DMRS; Copestake, 2009). All three frameworks serve as target representations in recent parsing research (e.g. Buys and Blunsom, 2017; Chen et al., 2018; Hershcovich et al., 2018). Type (2) Finally, our framework review will include Abstract Meaning Representation (AMR; Banarescu et al., 2013), which in our hierarchy of graph flavors is considered unanchored, in that the correspondence between nodes and tokens is not explicitly annotated. The AMR framework deliberately backgrounds notions of compositionality and derivation. At the same time, AMR frequently invokes lexical decomposition and represents some implicitly expressed elements of meaning, such that AMR graphs quite generally appear to ‘abstract’ furthest from the surface signal. Since the first general release of an AMR graph bank in 2014, the framework has provided a popular target for semantic parsing and has been the subject of two consecutive tasks at SemEval 2016 and 2017 (May, 2016; May and Priyadarshi, 2017). 3 Processing Semantic Graphs The creation of large-scale, high-quality semantic graph banks has driven research on semantic parsing, where a system is trained to map from natural-language sentences to graphs. There is now a dizzying array of different semantic parsing algorithms, and it is a challenge to keep track of their respective strengths and weaknesses. Different parsing approaches are, of course, more or less effective for graph banks of different flavors (and, at times, even specific frameworks). We will discuss these interactions in the tutorial and categorize existing approaches into four classes. Factorization-based approach A factorizationbased parser explicitly models the target semantic structures by defining a score function that is able to evaluate the “goodness” of any candidate graph. To make a score function computable, a parser usually factorizes the score of a graph into parts for smaller substrings and can then apply dynamic programming to search for the best graph. Composition-based approach Following the Principle of Compositionality, a semantic graph can be viewed as the result of a derivation process, in which a set of lexical and syntacticosemantic rules are iteratively applied and evaluated. A composition-based parser explicitly models such derivation structures by defining a symbolic system to manipulate graph construction and a score function to select preferable derivations. Transition-based approach A transition-based parser models a derivation process in a left-toright, word-by-word way. The key to building a high-accuracy parser is to define a score function that evaluates the individual derivation decisions for each token. In order to find a good derivation among a large set, a parser usually adopts a greedy search strategy which is sometimes psycholinguistically motivated. Translation-based approa