Author Name | Affiliation | CongShuai | School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China | ZHANG Ji-bin | School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China | XU Zhi-ming | School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China | WANG Yu-ying | School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China |
|
Abstract: |
In order to solve the poor performance in text classification when using traditional formula of mutual information (MI),a feature selection algorithm were proposed based on improved mutual information.The improved mutual information algorithm,which is on the basis of traditional improved mutual information methods that enhance the MI value of negative characteristics and feature’s frequency,supports the concept of concentration degree and dispersion degree.In accordance with the concept of concentration degree and dispersion degree,formulas which embody concentration degree and dispersion degree were constructed and the improved mutual information was implemented based on these.In this paper,the feature selection algorithm was applied based on improved mutual information to a text classifier based on Biomimetic Pattern Recognition and it was compared with several other feature selection methods.The experimental results showed that the improved mutual information feature selection method greatly enhances the performance compared with traditional mutual information feature selection methods and the performance is better than that of information gain.Through the introduction of the concept of concentration degree and dispersion degree,the improved mutual information feature selection method greatly improves the performance of text classification system. |
Key words: text classification feature selection improved mutual information Biomimetic Pattern Recognition |
DOI:10.11916/j.issn.1005-9113.2011.03.027 |
Clc Number:TP391.1 |
Fund: |