Abstract:In this paper, we compare learning techniques based on statistical classiication to traditional methods of relevance feedback for the document routing problem. We consider three classiication techniques which have decision rules that are derived via explicit error minimization: linear discriminant analysis, logistic regression , and neural networks. We demonstrate that the classiiers perform 10-15% better than relevance feedback via Rocchio expansion for the TREC-2 and TREC-3 routing tasks. Of the two classical information retrieval tasks 1 document routing is most amenable to machine learning. A xed, standing query, and a training collection of judged documents 2 is provided and the task is to assess the relevance of a fresh set of test documents. This can clearly be approached as a problem of statistical text classiication: documents are to be assigned to one of two categories, relevant or non-relevant, and inference is possible from the labeled documents. In contrast, the classical ad-hoc search problem presumes only a query and an unlabelled collection is provided. The standard approach to document routing models document content as a bag-of-words, represented as a sparse, very high-dimensional vector, with one component for each unique term in the vocabulary (Salton, Wong, & Yang 1975). Vector weights are proportional to term frequency and inversely proportional to collection frequency. 3 The general technique is to score test documents with respect to their closeness to the query (also represented a sparse, high-dimensional vector), Authors listed in alphabetic order. 1 as deened and evaluated by the TREC confer-ences(Harman 1994; 1995) 2 Actually, only a few documents are explicitly labeled, including most of the relevant documents and a few of the irrelevant documents. All other documents are implicitly assumed to be irrelevant. 3 The exact expression varies across systems, but is typ-where closeness is measured by the cosine between vectors. A modiied and expanded query is learned from the training set via Rocchio-expansion Relevance Feedback (Buckley, Salton, & Allan 1994), which essentially constructs a linear combination of the query vector, the centroid of the relevant documents and, occasionally, the centroid of select irrelevant documents 4. The net result is a scored list of test documents, which may be ranked in decreasing score order for the purposes of presentation and evaluation. Evaluation typically proceeds by averaging precision 5 at a number of recall 6 thresholds. Rocchio-expansion Relevance Feedback employs a weak learning method. However, the application of stronger methods faces two problems: the …

Document Routing as Statistical Classiication the Routing Problem Step 1: Local Regions Step 2: Document Representations N(nr+nn? ? Nr?nn+) 2 Summary of Routing Algorithm

Performance evaluation of predictive feedback routing for freeway networks

Context-aware, Preference-Based Vehicle Routing

Question routing in community based QA: incorporating answer quality and answer content

A Document Relevance Based Search Result Re-Ranking

Routing Questions to the Right Users in Online Communities

Investigating Routing in the VANET Network: Review and Classification of Approaches

Global Ranking of Documents Using Continuous Conditional Random Fields

Pseudo-Relevance Feedback Based On Mrmr Criteria

Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning

The rough set based approach to generic routing problems: case of reverse logistics supplier selection

Personalized Question Routing Via Heterogeneous Network Embedding.

A comparison of approaches for imbalanced classification problems in the context of retrieving relevant documents for an analysis

A Linear Text Classification Algorithm Based on Category Relevance Factors

Learning to Match for Multi-criteria Document Relevance

Investigating Passage-level Relevance and Its Role in Document-level Relevance Judgment

Two Odds-Radio-Based Text Classification Algorithms

Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification.

Learn-n-Route: Learning implicit preferences for vehicle routing

Optimal Answerer Ranking for New Questions in Community Question Answering

UAV Routing for Enhancing the Performance of a Classifier-in-the-loop