Object tracking algorithm based on matrix low-rank representation
Musa Yasin,Kerim Muhtar
DOI: https://doi.org/10.11834/jig.170083
2018-01-01
Journal of Image and Graphics
Abstract:Objective Visual object tracking is a significant computer vision task that can be applied to many domains,such as military,robotics,intelligent visual surveillance,human-computer interaction,and medical diagnosis.A large variety of trackers that have been proposed in the literature in the past decades have delivered satisfactory performances.Despite the success of researching on this topic,visual object tracking still suffers from difficulties in handling complex object appearance changes caused by factors such as illumination,partial occlusion,shape deformation,background clutter,low contrast,specularities,camera motion,and at least seven more aspects.Generally,visual tracking is a search (or classification) problem that continuously infers the state of a target in video sequences,aims to identify the candidate while it matches to the target template accurately,and returns it as a tracking result.Constructing an effective and high-performance tracker has two core issues.The first is the issue of representative feature learning and high-level modeling.The second is the problem of filtering and efficient searching.Given that the target states in every video frame are represented using several online learned feature templates,the modeling capability of the tracker will significantly depend on the generalizability of template data and accurate model representation with error estimation precision because of the complex interference factors caused by the target itself or the scene conditions.In addition,the relationship between each data pixel is significantly damaged while its original data structures are being changed because the sample data are intentionally forced into vector form in most existing algorithms.Moreover,the computational complexity with high data dimensionality must be increased.Therefore,designing an effective model representation mechanism of the 2D appearance of moving objects with the appropriate data expression is the key issue for the success of a visual tracker.Method In this study,the appearance model representation problem of generative-model-based visual object tracking algorithm is investigated in depth.In a prior work,we formulated the observation model via tensor (3D array) nuclear norm regularization.The tracker is called tensor nuclear norm regression-based tracker (TNRT) and has achieved favorable results in many tracking environments.However,the TNRT requires high hardware conditions and graphics processing unit computing demands,which will lead to slow tracking speeds if some practical uses require low hardware conditions.Therefore,we redesign a novel matrix low-rank representation-based observation model and its corresponding likelihood measurement function,as well as maintain several good properties of the TNRT algorithm,such as multitask joint learning,nuclear norm regularization-based model representation,and original data structures of sample signals.In the proposed tracking framework,several critical feature templates (dictionary or subspace) are learned from online data using the incremental principal component analysis algorithm.Then,in accordance with the appearance information of an incoming video frame,the proposed appearance modeling mechanism will use the feature templates to represent the target candidate linearly with independent and identically distributed Gaussian-Laplacian mixture noise by adopting the multitask joint learning strategy.Subsequently,the matrix nuclear norm and weighted L1-norm-based joint maximum likelihood function measure the distances between target candidates and feature subspace scrupulously.Given that the intrinsic data structures of samples are guaranteed using the matrix form and the spatial distributions of visual features remain intact,the proposed multitask observation modeling via matrix low-rank regularizationbased objective function will construct more accurate and flexible sample signals than L1,L2,or other hybrid regularizationbased model representation methods.Then,in every frame,the identical likelihood measurement function of our algorithm measures each candidate sample with obvious comparability.Finally,the tracker is able to explore the potential characteristics of the sample data fully and further detect the complex appearance changes of the target with some challenging disturbances,such as occlusion or strong illuminations.Meanwhile,the observation model,which formulates matrix-form-based data prototypes,can improve the tracking speed remarkably with its distinctly reduced data dimensionality and low computational complexity.Result Although the pixels of residual data always show similar grayscale intensities and share some spatial information with 2D data prototypes,such as block-shaped linking areas,the conventional observation model using L1,L2,or other hybrid regularization-based model representation methods cannot fully examine the potential structure of residual data.In comparison to these traditional methods,the matrix low-rank regression model (MLRM) more precisely explores the residual data and further detects the spatial characteristics of reconstruction error.In other words,the MLRM significantly discovers the low-rank characteristics of the residual matrix.In this study,we aim to evaluate our proposed tracking algorithm systematically and experimentally on 10 public video fragments that cover the previously mentioned challenging noisy factors and compare it with several state-of-the-art algorithms commonly cited in influential literature.We indicate that each tracker can be evaluated objectively using survival curves,such as average center point error (ACE),average overlap rate (AOR),and average success rate (ASR).Our tracking algorithm reflects the favorable robustness in these noisy environments and obtains the best results in each video sequence,with ACE,AOR,and ASR of 5.29 pixels,78%,and 98.28%,respectively.Conclusion In this study,a novel multitask matrix low-rank model representation method and its corresponding maximum likelihood estimation function are designed.The analysis of a large variety of circumstances in several public video sequences provides objective insight into the strengths and weaknesses of each tracker.The appearance modeling mechanism and maximum likelihood estimation function of the proposed MLRM algorithm play critical roles and achieve favorable tracking results in several challenging video sequences.Qualitative and quantitative experimental evaluations of a number of challenging noisy environments indicate that the proposed MLRM algorithm can reflect the best robustness to elevate the model degradation or drifting problem caused by occlusion and strong illumination and can achieve the same or even better results when compared with several state-of-the-art algorithms.