摘要
针对K近邻多标签(ML-KNN)分类算法中未考虑标签相关性的问题,提出了一种基于标签相关性的K近邻多标签分类(CML-KNN)算法。首先,计算出标签集合中每对标签间的条件概率;其次,对于即将被预测的标签,将其与已经预测的标签间的条件概率进行排序,求出最大值;最后,将最大值跟对应标签值相乘同时结合最大化后验概率(MAP)来构造多标签分类模型,对新标签进行预测。实验结果表明,所提算法在Emotions数据集上的分类性能均优于ML-KNN、Adaboost MH、RAk EL、BPMLL这4种算法;在Yeast、Enron数据集上仅在1-2个评价指标上低于MLKNN与RAk EL算法。由实验分析可知,该算法取得了较好的分类效果。
Since the Multi-Label K Nearest Neighbor (ML-KNN) classification algorithm ignores the correlation between labels, a multi-label classification algorithm by exploiting label correlation named CML-KNN was proposed. Firstly, the conditional probability between each pair of labels was calculated. Secondly, the conditional probabilities of predicted labels and the conditional probability of the label to be predicted were ranked, then the maximum was got. Finally, a new classification model by combining Maximum A Posteriori (MAP) and the product of the maximum and its corresponding label value was proposed and the new label value was predicted. The experimental results show that the performance of CML-KNN on Emotions dataset outperforms the other four algorithms, namely ML-KNN, AdaboostMH, RAkEL, BPMLL, while only two evaluation metric values are lower than those of ML-KNN and RAkEL on Yeast and Enron datasets. The experimental analyses show that CML-KNN obtains better classification results.
出处
《计算机应用》
CSCD
北大核心
2015年第10期2761-2765,共5页
journal of Computer Applications
基金
安徽省科技攻关计划项目(1301b042020)
高等学校博士学科点专项科研基金资助项目(20133401110009)
安徽大学研究生学术创新项目(Ygh100166)
关键词
标签相关性
K近邻多标签
条件概率
多标签分类
label correlation
Multi-Label K Nearest Neighbor (ML-KNN)
conditional probability
muhi-label classification