期刊检索

  • 2024年第56卷
  • 2023年第55卷
  • 2022年第54卷
  • 2021年第53卷
  • 2020年第52卷
  • 2019年第51卷
  • 2018年第50卷
  • 2017年第49卷
  • 2016年第48卷
  • 2015年第47卷
  • 2014年第46卷
  • 2013年第45卷
  • 2012年第44卷
  • 2011年第43卷
  • 2010年第42卷
  • 第1期
  • 第2期

主管单位 中华人民共和国
工业和信息化部
主办单位 哈尔滨工业大学 主编 李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码
微信公众号二维码
引用本文:李玉强,陈鋆昊,李琦,刘爱华.基于差分隐私下包外估计的随机森林算法[J].哈尔滨工业大学学报,2021,53(2):146.DOI:10.11918/201912140
LI Yuqiang,CHEN Junhao,LI Qi,LIU Aihua.Random forest algorithm under differential privacy based on out-of-bag estimate[J].Journal of Harbin Institute of Technology,2021,53(2):146.DOI:10.11918/201912140
【打印本页】   【HTML】   【下载PDF全文】   查看/发表评论  下载PDF阅读器  关闭
过刊浏览    高级检索
本文已被:浏览 1001次   下载 945 本文二维码信息
码上扫一扫!
分享到: 微信 更多
基于差分隐私下包外估计的随机森林算法
李玉强1,陈鋆昊1,李琦1,刘爱华2
(1.武汉理工大学 计算机科学与技术学院,武汉 430063; 2.武汉理工大学 能源与动力工程学院,武汉 430063)
摘要:
针对差分隐私随机森林算法在对高维数据进行分类时准确率不理想的问题,本文通过引入差分隐私下的包外估计来计算决策树权重以及特征权重,从而提出一种基于差分隐私下包外估计的随机森林算法(random forest under differential privacy based on the out-of-bag estimate, RFDP_OOB).本算法首先在差分隐私保护下生成一部分的随机森林,利用差分隐私下包外估计的特性对决策树和特征的重要性进行评估,从而计算出决策树权重以及特征权重,然后通过特征权重对特征进行划分,得到非重要特征集.接着在生成剩下的一部分随机森林的过程中,对最佳特征为非重要特征的结点进行预剪枝操作,使其成为叶子结点,从而减小噪声、提高决策树分类准确率,并具有较好的执行效率.最后在预测分类结果时,取所对应的决策树权重最大的分类结果作为随机森林算法的分类结果,从而提高随机森林算法的分类准确率.本文还对算法的有效性和隐私性进行了理论分析,并通过实验结果验证了本算法的有效性,本算法可以在保护数据隐私性的同时提高算法的分类准确率.
关键词:  差分隐私  随机森林  包外估计  高维数据  数据挖掘
DOI:10.11918/201912140
分类号:TP391
文献标识码:A
基金项目:
Random forest algorithm under differential privacy based on out-of-bag estimate
LI Yuqiang1,CHEN Junhao1,LI Qi1,LIU Aihua2
(1.School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China; 2.School of Energy and Power Engineering, Wuhan University of Technology, Wuhan 430063, China)
Abstract:
Since the accuracy of random forest algorithm under differential privacy is undesirable when classifying high-dimensional data, the out-of-bag estimate was introduced to calculate the weights of decision trees and features, and the random forest algorithm under differential privacy based on the out-of-bag estimate (RFDP_OOB) was proposed. First, the algorithm generates a part of random forest under differential privacy, and the importance of decision trees and features is evaluated by utilizing the out-of-bag estimate under differential privacy, so as to calculate the weights of the decision trees and features. Then, the features are re-divided into non-essential features through feature weights. Next, in the process of generating the remaining part of the random forest, the pre-pruning operation is performed on the nodes whose best features are non-important features to make them leaf nodes, so as to reduce noise and improve the classification accuracy of the decision tree with better efficiency. Finally, in predicting the classification results, the classification result with the maximum weight of the corresponding decision tree is taken as the classification result of the random forest algorithm, thereby improving the classification accuracy of the random forest algorithm. The privacy and effectiveness of the algorithm were analyzed theoretically, and the experimental results verified the effectiveness of the algorithm. The proposed algorithm can improve the classification accuracy and protect the privacy of data.
Key words:  differential privacy  random forest  out-of-bag estimate  high-dimensional data  data mining

友情链接LINKS