Master's Thesis

Edward Kenschaft
Abstract:My research at the University of Maryland involved three distinct but related areas within the field of computational linguistics: I. Unsupervised morphological analysis (UMA) – Breaking words into component morphemes for nearly any written language, using only raw text for analysis. II. Word sense disambiguation (WSD) – Distinguishing between a word's possible meanings. A. Crosslingual word sense disambiguation (CWSD) – WSD where a word's meanings are identified by its translations into another language. III. Machine translation (MT) – Translating text automatically from one language to another, particularly making use of UMA and CWSD. We evaluated the following experimental hypotheses: I. Unsupervised morphological analysis (UMA) A. We can design a UMA system that applies to any morphologically complex language, and has accuracy comparable (not necessarily superior) to language-specific systems. (Task I) II. Word sense disambiguation (WSD) B. WSD accuracy can be improved by providing unannotated data in addition to annotated training data. (Task II.A) Master's Thesis Edward Kenschaft C. WSD accuracy can be improved by using document-level features in addition to sentence-level features. (Task II.B) D. WSD accuracy can be improved by performing UMA on the text. (Task II.C) III. Machine translation (MT) E. MT accuracy can be improved by performing UMA on the source text. (Task III.A) F. MT accuracy can be improved by performing CWSD on the source text. (Task III.B) This paper summarizes all experimental results, with conclusions supporting hypotheses A, B, C and D. Please note the partial glossary at the end of the paper. Disclaimers: This paper is not intended for publication, but is rather a summary of my research efforts at UMD, from fall 2005 through 2006. It incorporates portions of several project reports, minimally edited, as well as descriptions of incomplete experiments. Summaries of related research efforts have generally not been updated since 2006. Master's Thesis Edward Kenschaft UNSUPERVISED MORPHOLOGICAL ANALYSIS AND CROSSLINGUAL WORD SENSE DISAMBIGUATION USING LARGE QUANTITIES OF UNANNOTATED DATA Edward Kenschaft Master's thesis based on pre-candidacy doctoral research University of Maryland Copyright © 2008 Edward Kenschaft. Table of
Linguistics,Computer Science
What problem does this paper attempt to address?