引用本文: | 邢雪,于德新,周户星,田秀娟.多源非平衡交通检测数据的异常识别方法[J].哈尔滨工业大学学报,2019,51(9):165.DOI:10.11918/j.issn.0367-6234.201803092 |
| XING Xue,YU Dexin,ZHOU Huxing,TIAN Xiujuan.A method of abnormal data recognition of multi-source traffic with non-equilibrium feature[J].Journal of Harbin Institute of Technology,2019,51(9):165.DOI:10.11918/j.issn.0367-6234.201803092 |
|
本文已被:浏览 1704次 下载 1306次 |
码上扫一扫! |
|
多源非平衡交通检测数据的异常识别方法 |
邢雪1,2,于德新2,3,周户星2,3,田秀娟2
|
(1.吉林化工学院 信息与控制工程学院,吉林 吉林 132022;2.吉林大学 交通学院,长春 130022; 3.吉林省智能交通工程研究中心,长春 13002)
|
|
摘要: |
为保证交通检测数据的准确性并服务于实时的交通状态判别和预测,交通大数据采用多种检测源数据协同处理并利用机器学习的方法进行异常识别. 异常检测数据的识别主要基于机器学习中AdaBoost方法实现. 在算法的训练过程中,为消除单一检测源数据的离群现象,训练数据选取同一路段上多种检测源提供的数据集. 在算法的决策过程中,通过代价敏感方法的优势来改进AdaBoost的决策. 实验结果表明:基于非均衡特性改进的AdaBoost模型迫使分类器更加关注了待识别的异常样本,增强了AdaBoost决策过程中训练决策树规则的代表性,提高了异常类样本的分类准确率. 高速公路实例检测数据集验证了改进算法与相关经典算法的检测准确度、误检率、误警率等指标,其中改进模型与原模型相比,准确率提高了5.547%,误检率减低了6.792%. 多种算法的ROC曲线对比表明改进的AdaBoost方法筛选交通检测样本的可靠度更高,可有效调整由非平衡数据导致的分类误差. |
关键词: AdaBoost 数据异常识别 多源交通数据 非平衡检测数据 机器学习 |
DOI:10.11918/j.issn.0367-6234.201803092 |
分类号:U491.1 |
文献标识码:A |
基金项目:国家科技支撑计划(2014BAG03B03) |
|
A method of abnormal data recognition of multi-source traffic with non-equilibrium feature |
XING Xue1,2,YU Dexin2,3,ZHOU Huxing2,3,TIAN Xiujuan2
|
(1. College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin 132022, Jilin, China; 2. Transportation College, Jilin University, Changchun 132002, China; 3. Jilin Engineering Research Center for Intelligent Transportation System, Changchun 132002, China)
|
Abstract: |
The identification and prediction of real-time traffic conditions rely on data processing. Abnormal data recognition in traffic big data uses machine learning methods with multi-source traffic to ensure the accuracy of traffic detection data. The recognition of anomaly detection data is based on AdaBoost method in machine learning. To eliminate the outlier phenomenon of the single detection source data, the training dataset of the training process selected datasets provided by multiple detection sources on the same road section. The cost-sensitive method optimizes the decision-making process of the improved algorithm. Experimental results show that the improved AdaBoost model forced the classifier to pay more attention to abnormal class samples, which enhanced the representation of training decision tree rules in the AdaBoost and improved the classification accuracy of abnormal samples. The highway test dataset verified the detection accuracy, false detection rate, false alarm rate, and other indicators of the improved algorithm and related classical algorithms. The accuracy rate of the improved algorithm was increased by 5.547%, and the false detection rate was reduced by 6.792%. The comparison of ROC curves shows that the improved AdaBoost method is more reliable in identifying abnormal samples of traffic detection and can effectively adjust the classification error caused by non-equilibrium data. |
Key words: AdaBoost abnormal data recognition multi-source traffic data non-equilibrium detection data machine learning |