Abstract:In this paper, we present our approaches and results of high-level feature extraction and automatic video search in TRECVID-2007. In high-level feature extraction, our main focus is to explore the upper limit of bag-of-visualwords (BoW) approach based upon local appearance features. We study and evaluate several factors which could impact the performance of BoW. By considering these important factors, we show that a local feature only system already yields top performance (MAP= 0.0935). This conclusion is similar to our recent experiment of VIREO-374 on TRECVID-2006 dataset [1], except that the improvement, when incorporating with other features, is marginal. Description of our submitted runs: CityU-HK1: linear weighted fusion of 4 SVM classifiers using BoW, edge histogram, grid based color moment and wavelet texture. CityU-HK2: average fusion of 5 SVM classifiers using BoW, spatial layout of keypoints, edge histogram, grid based color moment and wavelet texture. CityU-HK3: average fusion of 4 SVM classifiers using BoW, edge histogram, grid based color moment and wavelet texture. CityU-HK4: Bag-of-visual-words (BoW). CityU-HK5: average fusion of 3 baseline classifiers using edge histogram, grid based color moment and wavelet texture. CityU-HK6: average fusion of 2 baseline classifiers using grid based color moment and wavelet texture. In automatic search, we study the performance of query-by-example (QBE) and VIREO-374 ontology-based concept search. In QBE, the spatial properties of local keypoints and concept detector confidence are utilized for retrieval. In concept-based search, a small set of VIREO374 detectors are selected for query answering by measuring the similarity of query terms to semantic concepts in an Ontology-enriched Semantic Space. We submit six runs composing of concept-based, query-based, motion-based and text-based search. CityUHK-SCS: concept-based search in which one single concept is selected for each query. CityUHK-MCS: concept-based search in which top-3 concepts are selected. CityUHK-Concept: use 36-d concept detection confidence vectors of keyframes for QBE. CityUHK-ConceptRerank: use 36-d concept detection confidence vectors to rerank the result of text baseline. CityUHK-VKmotion-Rank: employ the motion histogram of visual keywords (VK) in video sequence to rerank the result of text baseline. CityUHK-Text: baseline run by ASR/MT transcripts. 1 High-Level Feature Extraction This year, we mainly focus on exploring the upper limit of local features for concept detection. Our local feature approach is basically based on our previous work in [2]. We also implement three baseline features and examine the improvement of fusing the local features with the baseline visual features. For the selection of training samples, we only rely on this year’s data and combine the two publicly available annotations from LIG [3] and MCG-ICT-CAS.

Towards Optimal Bag-of-features for Object Categorization and Semantic Video Retrieval

Object Recognition Based on the Region of Interest and Optimal Bag of Words Model.

Visual Word Proximity and Linguistics for Semantic Video Indexing and Near-Duplicate Retrieval

Refining local descriptors by embedding semantic information for visual categorization.

Evaluating Bag-of-visual-words Representations in Scene Classification

Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study

Experimenting VIREO-374: Bag-of-Visual-Words and Visual-Based Ontology for Semantic Video Indexing and Search.

Fast Object Retrieval Using Direct Spatial Matching

Object Recognition via Adaptive Multi-level Feature Integration

Bag-of-visual-words Expansion Using Visual Relatedness for Video Indexing

Modeling spatial and semantic cues for large-scale near-duplicated image retrieval

Randomized Locality Sensitive Vocabularies For Bag-Of-Features Model

Exploring Local Features and the Bag-of-Visual-Words Approach for Bioimage Classification.

Evaluating Inverted Files for Visual Compact Codes on a Large Scale

Spatial pooling of heterogeneous features for image classification.

Exploiting visual word co-occurrence for image retrieval.

Exploring Spatial Correlation for Visual Object Retrieval

Semantics-Preserving Bag-of-Words Models and Applications

Spatial pooling of heterogeneous features for image applications.

Object Recognition Using Words Model of Optimal Size in Histograms of Oriented Gradients

Considering The Spatial Layout Information Of Bag Of Features (Bof) Framework For Image Classification