引用本文: | 王若雨,赵千川,杨文.基于近邻搜索空间提取的LOF算法[J].哈尔滨工业大学学报,2023,55(10):1.DOI:10.11918/202205060 |
| WANG Ruoyu,ZHAO Qianchuan,YANG Wen.Isolation-based data extracting LOF[J].Journal of Harbin Institute of Technology,2023,55(10):1.DOI:10.11918/202205060 |
|
摘要: |
针对局部异常因子(local outlier factor,LOF)异常检测算法时间空间复杂度高、对交叉异常及低密度簇周围异常点不敏感等局限,提出了基于近邻搜索空间提取的LOF异常检测算法(isolation-based data extracting LOF,iDELOF),将基于隔离思想的近邻搜索空间提取(isolation-based KNN search space extraction,iKSSE)前置于LOF算法,以高效剪切掉大量无用以及干扰数据,获得更加精准的搜索空间。基于此完成了理论以及4组实验分析,每组实验分别进行iDELOF算法与LOF、iForest、iNNE等多种典型算法的对比分析。结果表明:iDELOF算法通过拉大正异常点局部离群因子的差距,增强了对交叉异常以及低密度簇周围异常点的识别能力,提升了LOF的检测效果;iDELOF算法在识别轴平行异常方面与LOF同样具有明显优越性;iDELOF算法通过iKSSE所获数据子集显著小于原数据集,多数子集数据量小于原数据集的1%,因此iDELOF的时间空间复杂度显著降低,且原数据集数据量越大,优越性越明显,当数据量足够大时,iDELOF算法的运行时间将低于IF算法。 |
关键词: 异常检测 iDELOF iKSSE 局部离群因子 实验分析 |
DOI:10.11918/202205060 |
分类号:TP301.6 |
文献标识码:A |
基金项目: |
|
Isolation-based data extracting LOF |
WANG Ruoyu1,ZHAO Qianchuan1,YANG Wen1,2
|
(1.Center for Intelligent and Networked Systems, Department of Automation, Tsinghua University, Beijing 100084, China; 2.Key Laboratory of Space Launching Site Reliability Technology, Haikou 570100, China)
|
Abstract: |
Addressing the limitations of LOF anomaly detection algorithm, such as with high time and space complexity and insensitivity to cross anomalies and outliers around low-density clusters, this paper proposes isolation-based data extracting LOF (iDELOF) anomaly detection algorithm, which puts the isolation-based K-nearest-neighbor search space extraction (iKSSE) in front of LOF, to efficiently cut out a large amount of useless and interfering data and obtain a more accurate search space. Based on this, the theoretical and four groups of experimental analysis are completed, and in each group of experiments, iDELOF is compared with many typical algorithms such as LOF, iForest and iNNE. The results show that iDELOF improves the detection capabilities of LOF by widening the gap between the local outlier factor of normal and abnormal points, and enhancing the ability to identify cross anomalies and abnormal points around low-density clusters.Additionally, iDELOF has the same obvious superiority as LOF in identifying axis-parallel anomalies. The data subset obtained by iDELOF through iKSSE is significantly smaller than the original dataset and the data volume of most subsets is less than 1% of the original dataset. Therefore, the time and space complexity of iDELOF is significantly reduced, and the larger the amount of data in the original dataset, the more obvious the superiority is. When the amount of data is large enough, the running time of iDELOF will be lower than that of the IF algorithm. |
Key words: abnormal detection iDELOF iKSSE local outlier factor experimental analysis |