Abstract:Compared with the common color video, infrared video is easily affected by the surrounding environment. In infrared pedestrian tracking, the appearance contour and gray distribution of the pedestrian target often have great changes, which lead to the difficulty of tracking. To solve this problem, this paper proposes a VPSiamRPN (Video prediction with Siamese Region Proposal Network) infrared pedestrian target tracking system. Aiming at the factors that seriously affect the performance of infrared pedestrian tracking (such as target deformation, target occlusion, and background clutter), the image prediction function of PredNet (Deep Predictive Coding Networks for Video Prediction and Unsupervised) was designed and applied to SiamRPN (Siamese Region Proposal Network) to improve the similarity between the tracking template and the detected target, so as to improve the tracking ability to the infrared pedestrian target. Nine comparative experiments were carried out by changing the number of layers of the network, the number of target images and frames used for prediction, and the tracking strategy of the network. On PTB-TIR dataset, experimental results show that the success plots and precision of theinfrared target recognition in thermal crossover, intensity change, occlusion, scale variation, and other attributes were much higher than those of SiamRPN, indicating good performance of infrared pedestrian tracking, which will have broad application prospects in this field.