引用本文: | 熊馨,徐礼胜,王春武,康雁.不平衡数据集的CT结肠镜息肉检测方法[J].哈尔滨工业大学学报,2013,45(11):112.DOI:10.11918/j.issn.0367-6234.2013.11.019 |
| XIONG Xin,XU Lisheng,WANG Chunwu,KANG Yan
.Polyp detection in CT colonography based on imbalanced data sets[J].Journal of Harbin Institute of Technology,2013,45(11):112.DOI:10.11918/j.issn.0367-6234.2013.11.019 |
|
摘要: |
目前CT结肠镜的息肉检测分类器面临着数据集不平衡问题,数据集中的正样本(息肉)的数量远远小于负样本.针对这个问题,息肉检测分类器采用SMOTEBoost,结合SMOTE(Synthetic Minority Over-Sampling Technique)和Boosting:在数据层面,采用过采样技术SMOTE合成少数类样本,减轻数据集中两类样本的不平衡程度;在算法层面,采用Boosting方法提高弱分类器的性能,两者结合起来,既改善对少数类样本的预测能力,又保证了对整个数据集的分类精度.为了满足息肉检测对算法实时性的需求,采用MRMR(Minimum Redundancy Maximum Relevance)方法挑选最大相关、最小冗余的简单特征组成级联第1层强分类器,拒绝大多数负样本,极大地提高了分类器的处理速度.实验结果表明:设计的分类器检测直径大于5 mm息肉的敏感度达到90%,每个数据体6个假阳. |
关键词: 不平衡数据集 CT结肠镜 结肠息肉检测 重采样 Boosting Cascade AdaBoost |
DOI:10.11918/j.issn.0367-6234.2013.11.019 |
分类号: |
基金项目:国家自然科学基金资助项目 (61071213). |
|
Polyp detection in CT colonography based on imbalanced data sets |
XIONG Xin, XU Lisheng, WANG Chunwu, KANG Yan
|
(School of Sino-Dutch Biomedical & Information Engineering, Northeast University, Shenyang 110819, China)
|
Abstract: |
Polyp detection in CT Colongraphy suffers from imbalanced data sets where negative samples (non-polyp) are dominant. In data level, SMOTE (Synthetic Minority Over-Sampling Technique) was applied to alleviate imbalanced degree by synthetic minority samples. In algorithm level, Boosting approach was employed in order to improve classification performance. Having combined Boosting with SMOTE (SMOTEBoost), the proposed classifier not only improved the prediction of the minority samples, but also guaranteed the accuracy over the entire data set. To satisfy real-time requirements for polyp detection, MRMR (Minimum Redundancy Maximum Relevance) was provided to select low-cost simple features for training the first stage of cascade, resulting in refusing the great majority negative samples and speeding procession. The experimental results showed that the classifier could achieve an overall per-polyp sensitivity of 90% (corresponding to the polyp whose diameter is equal to or greater than 5 mm), with false positives of 6 per volume on average. |
Key words: imbalanced data sets CT colonography colonic polyps’ detection re-sampling Boosting Cascade AdaBoost
|