World Wide Web - A Multilingual Language Resource

Fang Li,Huanye Sheng,Wilhelm Weisweber
DOI: https://doi.org/10.1007/3-540-45490-X_46
2001-01-01
Web Intelligence
Abstract:This paper argues that the World Wide Web could be regarded not only as an information resource but also as a dynamic, multilingual, least controlled, easy to access and untagged language corpus. In order to support this idea, we realized a method, which is able to extract bilingual lexicons from parallel WWW pages by two-stage alignment. Language pairs of German, English and Chinese have been selected but the realization is independent of any natural language, domain or markup.
What problem does this paper attempt to address?