Indian Institute of Technology Tirupati
Visual object tracking is a classical problem of estimating the trajectory of an target object in a video sequence, provided its location in the first frame. It is one of the fundamental problems in computer vision and can serve as a building block in complex vision systems. There are wide range of practical applications such as automatic surveillance, autonomous driving, Medical vision systems and Video analysis. This project will deal with the problem of improving the accuracy of state-of-the-art Siamese family trackers by incorporating adversarial mechanism into their training. There are number of challenges in designing a robust tracker because the target object undergoes a variety of complex appearance changes.
Object Transformations:
In this project, we design a generic, robust and novel end-to-end framework with two primary things to address in Visual Object Tracking. They are:
To utilize the backbones in various deep CNN architectures by investigating the control aspects governing the tracking accuracy and robustness.
To explore the role of adversarial learning in improving the power of deep convolution embedding networks to enhance tracking accuracy .
In this project, we have implemented several ways of improving the performance of Siamese Networks for object tracking.
We show some qualitative results of this tracker on standard benchmark sequences from VOT2016 dataset. The first sequence (bolt1) contain frequent changes in target pose leading to large changes in the target state, and is well suited to evaluate our approach. The second sequence (motocross1sequence) contain the geometric and photo-metric variations such as rotations and illumination changes. For comparison, we included the output of the state-of-the-art Siamese FC tracker. These approaches employ multi-scale search algorithm for determining the target scale i.e width and height of the bounding box. The results show that our adversarial learning approach improves the target localization indeed improving the target scale estimation.
We compare the original shallow SiamFC- AlexNet with deep SiamResNet22 with OTB15, OTB15, VOT16 and VOT17 benchmark datasets to prove that deep state-of-art architectures can improve tracking accuracy and robustness with necessary modifications in impact parameters.
Our experiments show that the proposed adversarial approach outperforms the baseline approach and also performs better than many state-of-the-art methods on the VOT2016 benchmark. Though the proposed adversarial learning based tracking framework is demonstrated with Siamese network, mainly due to its simplicity, the key idea can be extended to other versions of Siamese trackers for the generalization.
We compare our proposed adversarial approach with SiamFC to display the success plot and precision plot over all OTB100 benchmark challenging videos and Temple128 dataset videos. This tracker outperforms SiamFC by achieving Area Under Curve to comparable percentage.
In this project, the problem of accurate localization in tracking was studied and addressed. The proposed framework ALTO enables better location prediction of the target in similarity-learning trackers through incorporation of the adversarial learning, through correction of the predictions made by the baseline tracker. This approach was shown to provide the best results on the well known challenging tracking datasets, outperforming other state-of-the-art trackers, thus demonstrating the impact of the proposed approach. Though the proposed framework is demonstrated with Siamese network, mainly due to its simplicity, the key concept of the idea can be extended to other versions of Siamese trackers. Hence, we conclude that this kind of adversarial framework for tracking has paramount importance and holds unprecedented scopes for further improvements in the field of object tracking. There is a necessity for extensive baseline experiments to investigate and understand the impact of the various design choices on the proposed methodology.