Abstract:<p>Text classification (TC) is an essential task of natural language processing (NLP). In order to improve the performance of TC, term weighting is often used to obtain effective text representation by assigning appropriate weights to each term. A term weighting scheme is generally composed of term frequency factor, collection frequency factor and normalization factor. The normalization factor is commonly used as an optional factor to offset the influence of document length. Through the investigation of the existing term weighting schemes, we found that most of them focus on finding a more effective collection frequency factor, but rarely pay attention to finding a new term frequency factor. In this paper, we first proposed a new term frequency factor called modified term frequency (MTF). Different from the normalization factor, MTF directly modifies the raw term frequency based on the length information of all training documents. Then we proposed a new term weighting scheme by combining MTF with an existing collection frequency factor called modified distinguishing feature selector (MDFS). We denoted our scheme by MTF-MDFS (MDFS-based MTF). Extensive experimental results on 19 benchmark text datasets and 6 real-world text datasets show that our proposed MTF and MTF-MDFS are all much better than their state-of-the-art competitors in terms of the classification accuracy and the weighted average of <span class="math"><math>F1</math></span> of widely used base classifiers, such as MNB, SVM and LR.</p>

An initial attempt to improve spoken term detection by learning optimal weights for different indexing features

Integrate Document Ranking Information into Confidence Measure Calculation for Spoken Term Detection.

An improved supervised term weighting scheme for text representation and classification

Improved Spoken Term Detection by Feature Space Pseudo-Relevance Feedback.

Improved spoken term detection using support vector machines based on lattice context consistency

Improved Spoken Term Detection by Discriminative Training of Acoustic Models Based on User Relevance Feedback.

Lattice-based Indexing for Spontaneous Mandarin Speech

Improved Spoken Term Detection with Graph-Based Re-Ranking in Feature Space

Improved Semantic Retrieval of Spoken Content by Document/Query Expansion with Random Walk Over Acoustic Similarity Graphs

A study of supervised term weighting scheme for sentiment analysis

Fusing Multiple Systems into a Compact Lattice Index for Chinese Spoken Term Detection.

Using modified term frequency to improve term weighting for text classification

A Study of Lattice-Based Spoken Term Detection for Chinese Spontaneous Speech

Weight prediction and recognition of latent subject terms based on the fusion of explicit & implicit information about keyword

Learning the weight of the query term from the relevance feedback.

Modified DFS-based term weighting scheme for text classification

A Novel Term Weighting Scheme for Automated Text Categorization

Improved spoken term detection using support vector machines with acoustic and context features from pseudo-relevance feedback

Improved lattice-based spoken document retrieval by directly learning from the evaluation measures

Supervised Contrastive Learning with Term Weighting for Improving Chinese Text Classification

A Two-Step Keyword Spotting Method Based on Context-Dependent a Posteriori Probability