Abstract:This paper is concerned with keyword extraction. By keyword extraction, we mean extracting a subset of words/phrases from a document that can describe the ‘meaning' of the document. Keywords are of benefit to many text mining applications. However, a large number of documents do not have keywords and thus it is necessary to assign keywords before enjoying the benefit from it. Several research efforts have been done on keyword extraction. These methods make use of the ‘global context information', which makes the performance of extraction restricted. A thorough and systematic investigation on the issue is thus needed. In this paper, we propose to make use of not only ‘global context information', but also ‘local context information' for extracting keywords from documents. As far as we know, utilizing both ‘global context information' and ‘local context information' in keyword extraction has not been sufficiently investigated previously. Methods for performing the tasks on the basis of Support Vector Machines have also been proposed in this paper. Features in the model have been defined. Experimental results indicate that the proposed SVM based method can significantly outperform the baseline methods for keyword extraction. The proposed method has been applied to document classification, a typical text mining processing. Experimental results show that the accuracy of document classification can be significantly improved by using the keyword extraction method.

Document Classification Based on Support Vector Machine Using a Concept Vector Model

Knowledge-based Document Embedding for Cross-Domain Text Classification

Support Vector Machines Based on Information Geometry

A multiclass classification framework for document categorization

WordNet-based Concept Vector Space Model for Text Classification

Document Classification Based on Word Vectors

A VECTOR SPACE MODEL BASED DOCUMENT CLASSIFICATION SYSTEM [J]

Semantic-oriented 3D model classification and retrieval using Gaussian processes

Concept updating with support vector machines

Multivariate time series classification based on μσ-DWC feature and tree-structured M-SVM

SVM Classification:Its Contents and Challenges

A Fuzzy Similarity Based Concept Mining Model for Text Classification

Content-Based Natural Image Classification and Retrieval Using SVM

Keyword extraction using support vector machine

Chinese Document Categorization without Dictionary Support and Segmentation Processing

Document classification with distributions of word vectors

A comparison of SVM and RVM for Document Classification

Classify Japanese Document by Support Vector Machine

Research of Chinese Text Classification Methods Based on Semantic Vector and Semantic Similarity

Video Concept Detection Using Support Vector Machines - TRECVID 2007 Evaluations

Tensor Space Model for Document Analysis