Abstract:Conventional kNN algorithms ignore label correlations when being applied to multi-label text categorization. To cover this shortage, an improved Multi-label kNN approach for text categorization is proposed. A specific distance metric based on KL divergence is derived to measure the similarity between individual documents. Based on statistical information gained from the label sets of neighboring documents, a fuzzy maximum a posteriori principle is utilized to conjecture the label sets of the unlabeled documents. Different from ML-kNN, the proposed approach can exploit label correlations to improve classification performance effectively. Experiments on three benchmark datasets using 5 popular multi-label evaluation metrics suggest that the proposed approach achieves superior performance to some well-established multi-label learning algorithms, such as ML-kNN、Rank-SVM and BoosTexter.