Abstract:Multi-label image/video annotation is a challenging task that allows to correlate more than one high-level semantic keyword with an image/video-clip. Previously, a single model is usually used for the annotation task, with relatively large variance in performance. The correlation among the annotation keywords should also be considered. In this paper, to reduce the performance variance and exploit the correlation between keywords, we propose the En-CRF (Ensemble based on Conditional Random Field) method. In this method, multiple models are first trained for each keyword, then the predictions of these models and the correlations between keywords are incorporated into a conditional random field. Experimental results on benchmark data set, including Corel5k and TRECVID 2005, show that the En-CRF method is superior or highly competitive to several state-of-the-art methods.

Ensemble Approach Based on Conditional Random Field for Multi-Label Image and Video Annotation