Document Image Classification Without Optical Character Recognition

W. L. Xu,Qian Wang,Jun Guo
2006-01-01
The Journal of China Universities of Posts and Telecommunications
Abstract:A great deal of documents still have to be classified as document images due to the immaturity of Optical Character Recognition technology and unwillingness to be copied; as a result, document image retrieval has become an important field in information retrieval. In this paper, a method where directional element code and n-gram features were used to replace Optical Character Recognition was presented for Chinese document image classification. Experimental results show that 95 % processing time can be saved .
What problem does this paper attempt to address?