An Efficient Compression Method for ANSI Coded Chinese Text

常为领,方滨兴,云晓春,王树鹏,余翔湛
DOI: https://doi.org/10.3969/j.issn.1003-0077.2010.05.016
2010-01-01
Abstract:After surveying the proposal for compressing Chinese text,we present in this paper a universal compression algorithm for Chinese text,CRecode,which demonstrates an accurate understanding of the properties of the ANSI coded Chinese text.CRecode highlights the importance of pre-processing work for Chinese: it collect the Chinese Characters and sorts them by frequency order,then recode them into 8-bit,16-bit or 24-bit code.CRecode can act as a pre-processing tool for ANSI coded Chinese text by all the popular compression utilities,which can improve their compression ratio from 4% to 30%.
What problem does this paper attempt to address?