Mining Parallel Corpus from Sina Microblog

Haitao Xing,Muyun Yang,Haoliang Qi,Sheng Li,Tiejun Zhao
DOI: https://doi.org/10.1109/IALP.2013.29
2013-01-01
Abstract:Finding the parallel corpus as a kind of specific type of information from microblogging sites with millions of users, such as Sina Microblog, is a challenging task. This paper investigates the feasibility of mining such data from the username, the hash tag as well as the user relations by three different methods. The initial experiment is encouraging under the current restriction of limited microblog content access.
What problem does this paper attempt to address?