Text Categorization of Enron Email Corpus Based on Information Bottleneck and Maximal Entropy

Man Wang,Yifan He,Minghu Jiang
DOI: https://doi.org/10.1109/icosp.2010.5656737
2010-01-01
Abstract:This paper is for text categorization of Enron email corpus, we use the information bottleneck (IB) method to cluster the key words based on their distribution on different class labels, then we use threads and address groups as additional features to email texts, and the maximal entropy model to improve the accuracy of the classifier. Our experimental results shows that these measures can improve the classifier's performances, for keywords change too rapidly in emails while address groups are much steadier.
What problem does this paper attempt to address?