Related citation: | Yanbei Liu,Kaihua Liu,Xiao Wang,Changqing Zhang,Xianchao Tang.Unsupervised Feature Selection Using Structured Self-Representation[J].Journal of Harbin Institute Of Technology(New Series),2018,25(3):62-73.DOI:10.11916/j.issn.1005-9113.16194. |
|
Author Name | Affiliation | Yanbei Liu | School of Electronic Information Engineering, Tianjin University, Tianjin 300072, China | Kaihua Liu | School of Electronic Information Engineering, Tianjin University, Tianjin 300072, China | Xiao Wang | Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China | Changqing Zhang | School of Computer and Science Technology, Tianjin University, Tianjin 300072, China | Xianchao Tang | School of Computer and Science Technology, Tianjin University, Tianjin 300072, China |
|
Abstract: |
Unsupervised feature selection has become an important and challenging problem faced with vast amounts of unlabeled and high-dimension data in machine learning. We propose a novel unsupervised feature selection method using Structured Self-Representation (SSR) by simultaneously taking into account the self-representation property and local geometrical structure of features. Concretely, according to the inherent self-representation property of features, the most representative features can be selected. Meanwhile, to obtain more accurate results, we explore local geometrical structure to constrain the representation coefficients to be close to each other if the features are close to each other. Furthermore, an efficient algorithm is presented for optimizing the objective function. Finally, experiments on the synthetic dataset and six benchmark real-world datasets, including biomedical data, letter recognition digit data and face image data, demonstrate the encouraging performance of the proposed algorithm compared with state-of-the-art algorithms. |
Key words: unsupervised feature selection local geometrical structure self-representation property high-dimension data |
DOI:10.11916/j.issn.1005-9113.16194 |
Clc Number:TP181 |
Fund: |
|
Descriptions in Chinese: |
基于结构化自表达的无监督特征选择方法 刘彦北1, 刘开华1 ,王啸2 ,张长青3 ,唐先超3 (1.天津大学 电子与信息工程学院,天津 300072; 2. 清华大学 计算机科学与技术系, 北京 100084; 3. 天津大学 计算机科学与技术学院,天津 300072) 创新点说明: 1) 将数据特征的局部结构信息引入到特征自表达模型中,使得选取的特征更具有代表性; 2) 提出一种简单而有效的算法以优化所提目标函数。 摘要: 在机器学习中,面对大量的高维的无标记数据,无监督特征选择已经成为一个重要且具有挑战性的问题。同时考虑数据特征的自表达属性和局部结构信息,提出一种新颖的结构化自表达的无监督特征选择方法。具体地讲,采用数据特征的固有自表达属性,可以选择代表性的特征。同时,为了提高选取的准确性,探索了局部结构信息约束,使得特征相近则特征之间的表达系数也相近。此外,提出一个有效的算法以优化所提目标函数。最后,一个合成数据集和六个实际数据集(包括生物医学数据,数字字母识别数据和图像数据)上的实验结果表明,相比于目前的主流算法,本文所提算法具有优越性。 关键词:无监督特征选择;局部结构;自表达属性;高维数据 |