改进DPC聚类算法的离群点检测与解释方法

周玉; 夏浩; 裴泽宣

期刊检索

关键词检索

新闻公告MORE

【03-25】投稿请提供保密审查证明
【05-04】论文版权转让协议
【07-05】出版伦理声明
【04-04】告作者书
【07-11】审稿人的职责
【11-26】《哈尔滨工业大学学报》入选中国科技期刊卓越行动计划领军期刊
【10-17】《哈工大学报》入选“第5届中国精品科技期刊”
【12-30】《哈工大学报》入选“世界学术影响力Q2期刊”
【01-03】《哈工大学报》入选“2018中国国际影响力优秀学术期刊”
【11-01】哈工大学报荣获2016、2018、2020年度“中国高校百佳科技期刊奖”
【03-24】哈工大学报10篇论文入选中国精品科技期刊顶尖学术论文
【12-05】哈工大学报2024优秀审稿专家
【12-18】哈工大学报2023优秀审稿专家
【12-24】哈工大学报2022优秀审稿专家
【12-21】哈工大学报2021优秀审稿专家
【12-10】哈工大学报2020优秀审稿专家

主管单位 中华人民共和国
工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	周玉,夏浩,裴泽宣.改进DPC聚类算法的离群点检测与解释方法[J].哈尔滨工业大学学报,2024,56(8):68.DOI:10.11918/202305067
	ZHOU Yu,XIA Hao,PEI Zexuan.Improved outlier detection and interpretation method for DPC clustering algorithm[J].Journal of Harbin Institute of Technology,2024,56(8):68.DOI:10.11918/202305067

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 1112次下载 1346次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
改进DPC聚类算法的离群点检测与解释方法
周玉,夏浩,裴泽宣
(华北水利水电大学电气工程学院, 郑州 450045)

摘要:

为解决全局离群点检测方法无法对局部离群点进行检测,以及局部异常因子在面对大量局部离群点时性能下降的问题,利用k近邻（KNN）和核密度估计方法（KDE）提出一种基于改进快速搜索和发现密度峰值聚类算法（KDPC）的离群点检测与解释方法,该方法能够同时对数据点的全局和局部进行分析。首先,利用k近邻和核密度估计方法计算数据点的局部密度,代替传统DPC算法中根据截断距离计算的局部密度。其次,将数据点的k近邻距离之和作为全局异常值,并通过KDPC聚类算法计算簇密度以及数据点的局部异常值。最后,将数据点的全局与局部异常值进行乘积作为最终异常得分,选取异常得分最高的Top-n作为离群点,通过构建全局-局部异常值决策图对全局和局部离群点进行解释。利用人工数据集和UCI数据集进行实验并与10种常用离群点检测方法进行比较。结果表明,该方法对全局和局部离群点都有着较高的检测精度和检测性能,并且AUC方面受k值影响较小。同时,利用该方法对NBA球员数据进行分析讨论,进一步证明了该方法的实用性和有效性。

关键词: 离群点检测聚类密度峰值 k近邻核密度估计

DOI：10.11918/202305067

分类号:TP181

文献标识码:A

基金项目:国家自然科学基金 (U2,0), 河南省高等学校青年骨干教师培养计划项目(2018GGJS079)

Improved outlier detection and interpretation method for DPC clustering algorithm

ZHOU Yu,XIA Hao,PEI Zexuan

(School of Electrical Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450045, China)

Abstract:

To address the limitatios of global outlier detection methods in detecting local outliers and the performance degradation of local anomaly factors in the presence of a large number of local outliers, this paper proposes an outlier detection and interpretation method based on an improved fast search and discovery density peak clustering algorithm (KDPC), utilizing k-nearest neighbor (KNN) and kernel density estimation (KDE) methods. This method enables simultaneous analysis of both global and local data points. Firstly, the local density of data points is calculated using the k-nearest neighbor and kernel density estimation methods instead of the local density based on the truncation distance in the traditional DPC algorithm. Secondly, the sum of the k-nearest neighbor distances of the data points is used as the global outlier and the cluster density as well as the local outliers of the data points are calculated by the KDPC clustering algorithm. Finally, the global and local outliers of the data points are multiplied as the final anomaly score. The Top-n data points with the highest anomaly score is selected as the outlier, and the global and local outliers are interpreted by constructing a global-local outlier decision diagram. Experiments were conducted using both artificial and UCI datasets and our method was compared with 10 commonly used outlier detection methods. The results show that our method achieves high detection accuracy and performance for both global and local outliers. Moreover, the AUC performance is minimally affected by the k-value. Additionally, our method is also used to analyze NBA player data, further demonstrating its practicality and effectiveness.

Key words: outlier detection clustering density peaks k-nearest neighbors kernel density estimation

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS