A Bilingual Corpus in the Legal Domain and its Applications

O. Kwong,Benjamin Ka-Yin T'sou,Tom B. Y. Lai,R. Luk,L. Cheung,Francis C. Y. Chik
2001-01-01
Abstract:We introduce a bilingual domainspecific corpus. The parallel corpus consists of court judgments in English and Chinese, provided by the Hong Kong Judiciary. The texts were preprocessed, segmented, bilingually aligned and annotated in XML. About 100K Chinese characters and their corresponding English portions have been incorporated into a legal document retrieval system. The corpus also enables the study of many technological and linguistic issues.
What problem does this paper attempt to address?