Abstract:Since the accuracy of random forest algorithm under differential privacy is undesirable when classifying high-dimensional data, the out-of-bag estimate was introduced to calculate the weights of decision trees and features, and the random forest algorithm under differential privacy based on the out-of-bag estimate (RFDP_OOB) was proposed. First, the algorithm generates a part of random forest under differential privacy, and the importance of decision trees and features is evaluated by utilizing the out-of-bag estimate under differential privacy, so as to calculate the weights of the decision trees and features. Then, the features are re-divided into non-essential features through feature weights. Next, in the process of generating the remaining part of the random forest, the pre-pruning operation is performed on the nodes whose best features are non-important features to make them leaf nodes, so as to reduce noise and improve the classification accuracy of the decision tree with better efficiency. Finally, in predicting the classification results, the classification result with the maximum weight of the corresponding decision tree is taken as the classification result of the random forest algorithm, thereby improving the classification accuracy of the random forest algorithm. The privacy and effectiveness of the algorithm were analyzed theoretically, and the experimental results verified the effectiveness of the algorithm. The proposed algorithm can improve the classification accuracy and protect the privacy of data.