TFIDF_-NB协同训练算法被引量：1

TFIDF_-NB Co-Operative Training Algorithm

下载PDF

导出

摘要采用少量已标记和大量未标记文档进行文本分类已成为一种重要研究趋势 .在分析了 EM和联合训练 (Co-training)两类算法的基础上 ,提出一种新的协同训练算法 .该算法利用 Bayes和 TFIDF两种分类器结合少量已标记和大量未标记文档协同增量训练 .实验结果表明 ,协同训练算法正确率较高 ,平均错误率较 EM和联合训练低。 The problem of combining a small set of labeled data with a large pool of unlabeled data for text classification task has been extensively studied. After introduction and analyses of EM and Co-training algorithms, Presented a new “co-operatived” training algorithm. Co-operated TFIDF and NB algorithms to incorporate labeled data with unlabeled data in training process incrementally. Experimental results show that Co-operative training algorithm achieves higher accuracy rate and lower average error than EM and Co-training, and performs better.

作者彭雅林亚平陈治平

机构地区湖南大学计算机与通信学院

出处《小型微型计算机系统》 CSCD 北大核心 2004年第12期2243-2246,共4页 Journal of Chinese Computer Systems

基金国家自然科学基金 ( 60 2 72 0 5 1)资助

关键词文本分类半监督算法联合训练算法 EM算法协同增量训练 text classification semi supervise algorithm Co-training algorithm EM algorithm Co-operative training incrementally

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献11

1Yang Y. An evaluation of statistical approaches to text categorization[J]. Information Retrieval, 1999,1(1/2):67-88.
2Kamal Nigam, Andrew McCallum, Sebastian Thrun, etc. Learning to classify text from labeled and unlabeled documents[Z]. 1998,AAAI-98, 792--799.
3Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training[C]. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, 1998, 92-100.
4Nigam K, McCallum A, Thrun S etc. Text classification from labeled and unlabeled documents using EM[J]. Machine Learning, 2000, 39(2/3): 103-134.
5McCallum A and Nigam K. A comparison of event model for navie Bayes text classification[C]. In: AAAI-98 Workshop on Learning for Text Categorization of the Fifteenth International Conference(ICML'98), 359-367.
6Nigam K and Ghani R. Understanding the behavior of co-training[C]. In: Proceeding of KDD-2000 Workshop on Text Mining. 2000
7Nigam K and Ghani R. Analyzing the effectiveness and applicability of co-training[C]. In: Ninth International Conference on Information and Knowledge Management (CIKM-2000), 2000, 86-93.
8David Pierce and Claire Cardie. Limitaion of co-training for natural language learning from large datasets[C]. In: Proceedings of 2001 Conference on Empirical Methods in Natural Language Processing, 2001.
9McCallum A and Nigam K. A comparison of event model for navie Bayes text classification[C]. In: AAAI-98 Workshop on Learning for Text Categorization of the Fifteenth International Conference(ICML'98), 359-367.
10Joachims T. A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization[C]. Machine Learning: Proceedings of the Fourteenth International Conference, 1997, 143-151.

同被引文献5

1徐海霞.聚类分析在Web文本挖掘中的应用[J].情报杂志,2004,23(12):99-101. 被引量：4
2刘远超,王晓龙,刘秉权.一种改进的k-means文档聚类初值选择算法[J].高技术通讯,2006,16(1):11-15. 被引量：23
3张玉芳,彭时名,吕佳.基于文本分类TFIDF方法的改进与应用[J].计算机工程,2006,32(19):76-78. 被引量：120
4NIST.The 2003 Topic Detection and Tracking Evaluation[EB/OL].(2007-08-21).http://www.nist.gov/speech/tests/tdt/.
5李蓉,叶世伟,史忠植.SVM-KNN分类器——一种提高SVM分类精度的新方法[J].电子学报,2002,30(5):745-748. 被引量：133

引证文献1

1郑军,王巍,杨武,杨永田.基于类间距离参数估计的文本聚类评价方法[J].计算机工程,2009,35(9):37-39. 被引量：6

二级引证文献6

1曾利军,李泽军,柳佳刚.基于矩阵加权关联规则的区间模糊C均值聚类[J].计算机工程,2010,36(22):52-54. 被引量：1
2王新,刘晓霞.基于关联规则挖掘的垂直元搜索引擎研究[J].计算机工程,2011,37(4):76-77. 被引量：4
3王恒,王少山,高玉琢.面向主题的域内垂直搜索引擎系统的研究与实现[J].宁夏大学学报（自然科学版）,2013,34(1):54-57.
4任金成,张玲玲,肖云魁,朱忠奎.基于对称极坐标和图像处理的柴油机故障诊断研究[J].车用发动机,2013(6):80-85. 被引量：3
5袁洪芳,张任,王华庆.基于HMM与改进距离测度法的齿轮箱故障诊断[J].振动与冲击,2014,33(14):89-94. 被引量：5
6牛奉高,张荣杰.基于类内距离参数估计的文本聚类评价方法[J].山西大学学报（自然科学版）,2018,41(2):256-266. 被引量：1

1王娇,罗四维,王立.一种针对多关系数据的半监督协同训练算法[J].计算机科学,2012,39(B06):536-539.
2荣华.Web报表的XML＋JavaScript解决方案[J].程序员（CSDN开发高手）,2004(8):76-78.
3盛小春,岳晓冬.基于粗糙集理论的协同训练算法[J].计算机应用研究,2013,30(12):3546-3550. 被引量：1
4郭翔宇,王魏.一种改进的协同训练算法:Compatible Co-training[J].南京大学学报（自然科学版）,2016,52(4):662-671. 被引量：11
5冯少荣,肖文俊.基于样本选取的决策树改进算法[J].西南交通大学学报,2009,44(5):643-647. 被引量：18
6余雷,满家巨,刘利刚.基于联合字典学习的图像去噪[J].湖南师范大学自然科学学报,2013,36(6):11-16. 被引量：1
7武永成.一种基于分类置信度差异性的协同训练算法[J].湖北民族学院学报（自然科学版）,2013,31(1):74-77.
8郭涛,李贵洋,兰霞.基于图的半监督协同训练算法[J].计算机工程,2012,38(13):163-165. 被引量：5
9陈文,张恩阳,赵勇.基于多分类器协同学习的卷积神经网络训练算法[J].计算机科学,2016,43(9):223-226. 被引量：5
10姜远,佘俏俏,黎铭,周志华.一种直推式多标记文档分类方法[J].计算机研究与发展,2008,45(11):1817-1823. 被引量：10

小型微型计算机系统

2004年第12期

浏览历史

内容加载中请稍等...

TFIDF_-NB协同训练算法被引量：1

参考文献11

同被引文献5

引证文献1

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

TFIDF_-NB协同训练算法 被引量：1

参考文献11

同被引文献5

引证文献1

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

TFIDF_-NB协同训练算法被引量：1