THUUyMorph:An Uyghur Morpheme Segmentation Corpus
Halidanmu Abudukelimu,SUN Maosong,LIU Yang,Abudukelimu Abulizi
DOI: https://doi.org/10.3969/j.issn.1003-0077.2018.02.011
2018-01-01
Abstract:THUUyMorph(Tsinghua University Uyghur Morphology Segmentation Corpus)is an Uyghur corpus with morpheme segmentation annotations.The original corpus is downloaded from Tianshan websitein 2016,inclu-ding news,law,life,etc.Corpus are processed by proofreading of the original corpus,clauses segmentaion and proofreading,manual and automatic annotation for morpheme segmentation,manual annotation of phonetic harmony phenomenon,manual correction of morpheme segmentation and phonetic harmony.The corpus contains 10,596 doc-uments,69,200 sentences and 89,923 word types,which are annotated at both word-level and sentence-level.The corpus is available at .