Performing TextCategorization Onmanifold

Guihua Wen
2006-01-01
Abstract:Textcategorization hasbecomethekeytechnology inorganizing andprocessing thelarge amountoftextinfor- mation. Itnormally involves anextremely highdimensional space, whichmakesmostexisting approaches generate highly biased estimates soastoreduce theclassification accuracy. Theseapproaches donotconsider that thetextdocuments may beintrinsically located onthelow-dimensional manifold. This paperpresents anapproach thatperforms textcategorization ontexts manifold withrespect totheintrinsic global manifold structure, suchasbygeodesic distance tomeasurethedistance between twotexts. Thisapproach hasbeenapplied toimprove theKNN fortextcategorization. Thisisempirically validated bytheconducted experiments.
What problem does this paper attempt to address?