Abstract:In the process of panoramic video target tracking, the target deformation and scale changes caused by light change, interference of similar background, and object moving may result in target drift or missing, leading to low success rate and poor robustness. To address these issues, a target tracking method based on long short-term memory (LSTM) network and improved Real-Time MDNet (RT-MDNet) network was proposed. First, shallow convolution neural network was utilized to extract features, and adaptive RoIAlign was adopted to reduce pixel loss in the convolution process. Then, the weight of the last layer of the full connection layers was updated online by utilizing the target features to achieve foreground background separation and extract the target area. Lastly, the scale of the target box was selected adaptively by means of LSTM, and the target position information was thus obtained. Experimental results show that monocular vision algorithm could hardly adapt to the scale change and background change when applied in panoramic dataset, while the proposed method that utilizes 3-layer LSTM network to construct scale prediction module could effectively solve these problems. The algorithm can efficiently deal with the situations of small target, target occlusion, and cross motion of multiple targets in target tracking while maintaining accuracy, achieving better visual effect and higher overlap rate score.