Abstract:In this paper, we present our approaches and results of high-level feature extraction and automatic video search in TRECVID-2007. In high-level feature extraction, our main focus is to explore the upper limit of bag-of-visualwords (BoW) approach based upon local appearance features. We study and evaluate several factors which could impact the performance of BoW. By considering these important factors, we show that a local feature only system already yields top performance (MAP= 0.0935). This conclusion is similar to our recent experiment of VIREO-374 on TRECVID-2006 dataset [1], except that the improvement, when incorporating with other features, is marginal. Description of our submitted runs: CityU-HK1: linear weighted fusion of 4 SVM classifiers using BoW, edge histogram, grid based color moment and wavelet texture. CityU-HK2: average fusion of 5 SVM classifiers using BoW, spatial layout of keypoints, edge histogram, grid based color moment and wavelet texture. CityU-HK3: average fusion of 4 SVM classifiers using BoW, edge histogram, grid based color moment and wavelet texture. CityU-HK4: Bag-of-visual-words (BoW). CityU-HK5: average fusion of 3 baseline classifiers using edge histogram, grid based color moment and wavelet texture. CityU-HK6: average fusion of 2 baseline classifiers using grid based color moment and wavelet texture. In automatic search, we study the performance of query-by-example (QBE) and VIREO-374 ontology-based concept search. In QBE, the spatial properties of local keypoints and concept detector confidence are utilized for retrieval. In concept-based search, a small set of VIREO374 detectors are selected for query answering by measuring the similarity of query terms to semantic concepts in an Ontology-enriched Semantic Space. We submit six runs composing of concept-based, query-based, motion-based and text-based search. CityUHK-SCS: concept-based search in which one single concept is selected for each query. CityUHK-MCS: concept-based search in which top-3 concepts are selected. CityUHK-Concept: use 36-d concept detection confidence vectors of keyframes for QBE. CityUHK-ConceptRerank: use 36-d concept detection confidence vectors to rerank the result of text baseline. CityUHK-VKmotion-Rank: employ the motion histogram of visual keywords (VK) in video sequence to rerank the result of text baseline. CityUHK-Text: baseline run by ASR/MT transcripts. 1 High-Level Feature Extraction This year, we mainly focus on exploring the upper limit of local features for concept detection. Our local feature approach is basically based on our previous work in [2]. We also implement three baseline features and examine the improvement of fusing the local features with the baseline visual features. For the selection of training samples, we only rely on this year’s data and combine the two publicly available annotations from LIG [3] and MCG-ICT-CAS.

Rebuilding Visual Vocabulary via Spatial-temporal Context Similarity for Video Retrieval

Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval

Visual Words Refining Exploiting Spatial Co-Occurrence Table

Visual Word Proximity and Linguistics for Semantic Video Indexing and Near-Duplicate Retrieval

Modeling spatial and semantic cues for large-scale near-duplicated image retrieval

Video Retrieval Based On Words-Of-Interest Selection

Bag-of-visual-words Expansion Using Visual Relatedness for Video Indexing

Words-of-interest Selection Based on Temporal Motion Coherence for Video Retrieval

A Generalized BoVW Model for Content-Based Image Retrieval

Exploiting visual word co-occurrence for image retrieval.

Exploring Spatial Correlation for Visual Object Retrieval

Image Retrieval of Sub?Region Visual Phrases with Sparse Coding

Using Bag of Visual Words for Video Retrieval Calibration

Sketch-Based Image Retrieval with a Novel BoVW Representation.

Generating descriptive visual words and visual phrases for large-scale image applications

Spatial Encoding of Visual Words for Image Classification.

Creating the Bag-of-Words with Spatial Context Information for Image Retrieval

Visual Vocabulary Optimization with Spatial Context for Image Annotation and Classification

Building Pair-Wise Visual Word Tree For Efficent Image Re-Ranking

Experimenting VIREO-374: Bag-of-Visual-Words and Visual-Based Ontology for Semantic Video Indexing and Search.

W2VV++