基于差分隐私下包外估计的随机森林算法

李玉强; 陈鋆昊; 李琦; 刘爱华

期刊检索

关键词检索

新闻公告MORE

主管单位 中华人民共和国工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	李玉强,陈鋆昊,李琦,刘爱华.基于差分隐私下包外估计的随机森林算法[J].哈尔滨工业大学学报,2021,53(2):146.DOI:10.11918/201912140
	LI Yuqiang,CHEN Junhao,LI Qi,LIU Aihua.Random forest algorithm under differential privacy based on out-of-bag estimate[J].Journal of Harbin Institute of Technology,2021,53(2):146.DOI:10.11918/201912140

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 1115次下载 1024次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
基于差分隐私下包外估计的随机森林算法
李玉强¹,陈鋆昊¹,李琦¹,刘爱华²
(1.武汉理工大学计算机科学与技术学院,武汉 430063; 2.武汉理工大学能源与动力工程学院,武汉 430063)

摘要:

针对差分隐私随机森林算法在对高维数据进行分类时准确率不理想的问题,本文通过引入差分隐私下的包外估计来计算决策树权重以及特征权重,从而提出一种基于差分隐私下包外估计的随机森林算法（random forest under differential privacy based on the out-of-bag estimate, RFDP_OOB）.本算法首先在差分隐私保护下生成一部分的随机森林,利用差分隐私下包外估计的特性对决策树和特征的重要性进行评估,从而计算出决策树权重以及特征权重,然后通过特征权重对特征进行划分,得到非重要特征集.接着在生成剩下的一部分随机森林的过程中,对最佳特征为非重要特征的结点进行预剪枝操作,使其成为叶子结点,从而减小噪声、提高决策树分类准确率,并具有较好的执行效率.最后在预测分类结果时,取所对应的决策树权重最大的分类结果作为随机森林算法的分类结果,从而提高随机森林算法的分类准确率.本文还对算法的有效性和隐私性进行了理论分析,并通过实验结果验证了本算法的有效性,本算法可以在保护数据隐私性的同时提高算法的分类准确率.

关键词: 差分隐私随机森林包外估计高维数据数据挖掘

DOI：10.11918/201912140

分类号:TP391

文献标识码:A

基金项目:

Random forest algorithm under differential privacy based on out-of-bag estimate

LI Yuqiang¹,CHEN Junhao¹,LI Qi¹,LIU Aihua²

(1.School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China; 2.School of Energy and Power Engineering, Wuhan University of Technology, Wuhan 430063, China)

Abstract:

Since the accuracy of random forest algorithm under differential privacy is undesirable when classifying high-dimensional data, the out-of-bag estimate was introduced to calculate the weights of decision trees and features, and the random forest algorithm under differential privacy based on the out-of-bag estimate (RFDP_OOB) was proposed. First, the algorithm generates a part of random forest under differential privacy, and the importance of decision trees and features is evaluated by utilizing the out-of-bag estimate under differential privacy, so as to calculate the weights of the decision trees and features. Then, the features are re-divided into non-essential features through feature weights. Next, in the process of generating the remaining part of the random forest, the pre-pruning operation is performed on the nodes whose best features are non-important features to make them leaf nodes, so as to reduce noise and improve the classification accuracy of the decision tree with better efficiency. Finally, in predicting the classification results, the classification result with the maximum weight of the corresponding decision tree is taken as the classification result of the random forest algorithm, thereby improving the classification accuracy of the random forest algorithm. The privacy and effectiveness of the algorithm were analyzed theoretically, and the experimental results verified the effectiveness of the algorithm. The proposed algorithm can improve the classification accuracy and protect the privacy of data.

Key words: differential privacy random forest out-of-bag estimate high-dimensional data data mining

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS