期刊文献+

一种基于向量夹角的k近邻多标记文本分类算法 被引量:6

An kNN Algorithm Based on Vector Angle for Multi-label Text Categorization
下载PDF
导出
摘要 在多标记学习中,一个示例可以有多个概念标记。学习系统的目标是通过对由多标记样本组成的训练集进行学习,以尽可能正确地预测未知样本所对应的概念标记集。k近邻算法已被应用到多标记学习中,该算法将测试示例转化为多维向量,根据其k个近邻样本的标记向量来确定该测试示例的标记向量。传统的k近邻算法是基于向量的空间距离来选取近邻,而在自然语言处理中,文本间的相似度常用文本向量的夹角来表示,所以本文将文本向量间的夹角关系作为选取k近邻的标准并结合k近邻算法提出了一种多标记文本学习算法。实验表明,该算法在文档分类的准确率上体现出较好的性能。 In multi-label learning, each instance in the training set is associated with a set of labels, and the task is to output a label set whose size is unknown a priori for each unseen instance, k nearest neighbors (kNN) algorithm is recently applied to multi-label categorization. In detail, each instance is transformed into a vector and the label vector of the test instance is determined by its k nearest neighbors, which are chosen by the Euclidean distance of a couple of vectors. In this paper, a multi-label lazy learning approach named θ -MLkNN is presented, which is derived from the traditional k nearest neighbor (kNN) algorithm. Instead, we select the k nearest neighbors by the angle of two vectors. Experiments on a real-world text data set show that θ -MLkNN achieves better precision to traditional MLkNN algorithms.
作者 广凯 潘金贵
出处 《计算机科学》 CSCD 北大核心 2008年第4期205-206,F0003,共3页 Computer Science
关键词 机器学习 多标记学习 文本分类 Machine learning, Multi-label learning, Text categorization
  • 相关文献

参考文献1

二级参考文献2

共引文献244

同被引文献65

  • 1郝春风,王忠民.一种用于大规模文本分类的特征表示方法[J].计算机工程与应用,2007,43(15):170-172. 被引量:12
  • 2Tsoumakas G,Katakis I.Multi-label Classification: An Overview[J].International Journal of Data Warehousing and Mining,2007,3(3): 1-13.
  • 3Comité F,Gilleron R,Tommasi M.Learning Multi-label Alternating Decision Trees from Texts and Data[C]//Proc.of the 3rd International Conference on Machine Learning and Data Mining in Pattern Recognition.[S.l.]: Springer,2003: 35-49.
  • 4Zhang Minling,Zhou Z H.ML-kNN: A Lazy Learning Approach to Multi-label Learning[J].Pattern Recognition.2007,40(7):2038-2048.
  • 5Zhang M L,Zhou Zhihua.Multi-Label Learning by Instance Differentiation[C]//Proc.of the 22nd AAAI Conference on Artificial Intelligence.Vancouver,Canada: [s.n.],2007: 669-674.
  • 6Blake C,Merz C.UCI Repository of Machine Learning Database[EB/OL].[1998-10-09].http://www.ics.uci.edu/mlean/ML Repository.html.
  • 7Boutell M R,Luo Jiebo,Shen Xinpeng,et al.Learning Multi-label Scene Classification[J].Pattern Recognition.2004,37(9):1757-1771.
  • 8Azran A.The Rendezvous Algorithm: Multiclass Semi-supervised Learning with Markov Random Walks[C]//Proc.of the 24th International Conference on Machine Learning.New York,USA: ACM Press,2007: 49-56.
  • 9Sebastiani F.Machine learning in automated text categorization[J]. ACM Computing Surveys, 2002,34 ( 1 ) : 1-47.
  • 10Tsoumakas G, Katakis I.Multi-label classification: an overview[J]. International Journal of Data Warehousing and Mining, 2007, 3: 1-13.

引证文献6

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部