Session 4: Machine Translation
E. Hovy
Abstract:About 2~ years ago, ARPA initiated a new program in Machine Translation (MT). Three projects were funded: one of the papers in this section is directly related to these systems. In one way or another, each paper addresses one of the following two major dimensions of variation: basic ap. proach (i.e., operation and data collection by statistical vs. symbolic means) and depth of analysis (i.e., direct replacement , transfer, or interlingual). This overview first explains these terms and then describes the import of the papers. Basic Approach Over the past six years, the CANI)nn)I,: group at IBM has gained some impressive results, and considerable notoriety , by performing MT employing only statistical, non-linguistic, methods. Using cross-language correspondences collected statistically from 3m sentences of the Canadian Parliamentary records, which are bilingual French and English, c:Asnm)n,~ operates by replacing portions of each French input sentence with the statistically most appropriate English equivalent, taking the whole sentence into account, and then ~smoothing ~ the resulting words and phrases into the most probable grammatical English sentence. In contrast, the PAN(iI,OSS system takes a more traditional symbolic approach, involving linguistic and semantic knowledge resources such as grammars of Spanish and En-glish, a library of ~semantic ~ symbols that can be composed to represent the ~meaning = of each sentence, and a variety of process modules, such as sentence parsers, analyzers, and generators, that employ these resources to convert information from one form (say, a Spanish sentence) to another (say, a syntactic parse tree of that sentence). The i,zN(;s'n'^'l' system, as its name suggests, is a hybrid, involving linguistic-symbolic information for some subtasks and statistical information for others. Depth of Analysis The basic theoretical underpinnings of MT involve the amount of analysis performed on the input (source language) sentence during the process of converting it to the output (target language) sentence (since almost all MT systems work on a sentence by sentence basis, multisentence complexities are ignored here). In the simplest possible translation method, a system simply pattern-matches (portions of) each input sentence against a bilingual replacement dictionary and replaces each portion with its target language equivalent. The result is usually massaged in various ways in order to achieve some degree of grammaticality. A major problem with this approach is the immensity of the replacement dictionary required: since no generalizations are represented, the dictionary needs distinct entries for each form of each word (see, sees, saw, …