Supplementary Material for ICCV 2015 Paper # 1007 RIDE : Reversal Invariant Descriptor Enhancement
Lingxi Xie,Jingdong Wang,Weiyao Lin,Bo Zhang,Qi Tian
2015-01-01
Abstract:This document is the supplementary material of the ICCV paper #1007 [5]. We provide the proof of gradient estimation of SIFT, and the generalization of RIDE to deal with more types of reversal and rotation invariance. 1. Orientation Estimation of Dense SIFT In this section, we aim at proving an approximated estimation of SIFT orientation based on its local gradient values. The approximation is used in Section 3.3 of the main article. 1.1. Implementation of SIFT The implementation of SIFT is based on the original paper [2]. In the following paragraphs, we briefly review the process of orientation assignment and descriptor representation. First let us assume that the assignment of descriptor scale is finished, which fits the case of dense sampling [1] where all the descriptors have the same, fixed window size. Denote an image as I = [L(x, y)]W×H . The gradient magnitude, m(x, y), and orientation, θ(x, y), is pre-computed for each pixel: m(x, y) = [ (L(x+ 1, y)− L(x− 1, y)) + (L(x, y + 1)− L(x, y − 1)) ]1/2 (1) θ(x, y) = arctan [(L(x, y + 1)− L(x, y − 1)) / (L(x+ 1, y)− L(x− 1, y))] (2) The magnitude and orientation on each pixel are then used to estimate the dominant orientation of that descriptor. An orientation histogram is formed from gradient orientation of the pixels within a region around the keypoint. Each sample added to the histogram is weighted by its gradient magnitude and by a Gaussian-weighted circular window with a σ that is 1.5 times that of the scale of the keypoint. Peaks in the orientation histogram correspond to dominant orientations of local gradients. The highest peak in the histogram is detected, and then any other local peak that is within 80% of the highest peak is used to also create a keypoint with that orientation. Therefore, for locations with multiple peaks of similar magnitude, there will be multiple keypoints created at the same location and scale but different orientations. The above method works well on image matching and retrieval [2], but we do not need to assign multiple orientations for a descriptor in the classification tasks. As an alternation, it is also suggested to estimate a unique accumulated orientation ∗This work was done when Lingxi Xie was an intern at Microsoft Research.