Abstract:Due to the large number of parameters and large amount of calculation of deep convolutional networks, it is difficult to quickly and accurately deploy multi-scale target detection networks on many platforms with limited resources and power consumption. To solve this problem, based on the Python productivity for ZYNQ (PYNQ) framework, this paper realizes the IP core design and heterogeneous system architecture deployment of CTiny model, which is an anchor-free object detection model. First, a method of segmental quantization of the overall scaling factors in the convolution kernel was proposed, so that the pre-trained high-precision algorithm could be deployed on the field programmable gate array (FPGA) with low loss. Then, the system of the CTiny model was constructed based on the PYNQ framework, including ResNet backbone network, deconvolution network, and branch detection network. Finally, the time-consuming calculation such as picture preprocessing and post-processing was moved from serial ARM to parallel FPGA, further reducing the total processing time. Experimental results show that after deploying the CTiny model on the PYNQ-Z2 development board, the proposed quantization method achieved a mean average precision of 81.60% in the public optical remote sensing dataset NWPU VHR-10, which increased by 14.27% than truncated quantization. It has realized the requirement of deploying a tiny anchor-free object detection network with low loss. In addition, the processing time of post-processing was reduced from 9.228 s on the ARM side to 0.008 s on the FPGA side, which improved the speed of the detection model.