Abstract:Single target tracking is one of the most challenging application scenarios in the field of computer vision. To solve the problems of occlusion, object deformation, and motion blur during tracking, a training method was proposed to train single target tracking network based on generated diverse positive instances, and the problem of scarcity of various training samples was also mitigated. Specifically, during the offline stage, a variational autoencoder (VAE) was employed to encode original samples into latent space. Then, the hard positive data was generated via sampling variables in latent space to improve the diversity of the training data, and a training dataset was constructed by combining the generated data and the original samples. Besides, for the target template and the negative and positive samples of the training sequences, a probability triple loss function was utilized to train the tracking network. The relation between the positive and negative samples was investigated to improve the discriminative power of the tracking network. During the test stage, the pretrained Siamese neural network (SNN) was used to track the target, and the position of the target at the moment could be determined by correlating the target template and the search area. The experiment shows that the proposed algorithm improved the robustness and accuracy of SNN tracking in the cases of interference of similar objects and deformation, fast motion, rotation, motion blur, and occlusion of target during movement, and achieved real-time tracking performance.