期刊文献+

基于多质心的不良文本快速过滤方法

A Method of Illegal and Harmful Text Fast Filter Based on Multi-Centroid Vector
下载PDF
导出
摘要 针对Rocchio容易受到类别样本分布及噪声影响的而导致错误扩大类别范围的问题,提出对训练样本进行聚类,使用聚类形成的多个簇的质心向量替代单个质心向量作为过滤判定向量组的方法。该方法既能保证过滤效率,又比单质心的Rocchio过滤法具有更高的召回率和准确率。 Aiming at the defect in Rocchio that classification range could be easily mis-extended due to distribution of classification samples and noises,a filtering method is presented in this paper,in which a vector of single centroid is substituted by a vector group of centroids at multiple clusters formed by clustering trained samples and used as a deciding vector group for filtering.This method is characterized by lossless filtering efficiency.Recalling rate and accuracy of this method is higher than that of the single centroid-featured Rocchio Filtering.
出处 《广西科学院学报》 2010年第4期436-438,共3页 Journal of Guangxi Academy of Sciences
关键词 不良文本 快速过滤 多质心向量 ROCCHIO K-MEANS illegal and harmful text fast filter multi-centroid vector Rocchio K-means
  • 相关文献

参考文献8

二级参考文献61

  • 1王建会,王洪伟,申展,胡运发.一种实用高效的文本分类算法[J].计算机研究与发展,2005,42(1):85-93. 被引量:20
  • 2李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量:95
  • 3陈浩,何婷婷,姬东鸿.基于k-means聚类的无导词义消歧[J].中文信息学报,2005,19(4):10-16. 被引量:16
  • 4李小明.北大中文网页自动分类竞赛规则[Z].,2003(3)..
  • 5Text retrieval conference, http://trec. nist. gov (Accessed Sep. 20,2004).
  • 6Yang Y, Pedersen J O. A comparative study on feature selection in text categorization. 1997. http ://eiteseer. ist. psu. edu/yang97comparative. html. ( Accessed Sep. 10,2004).
  • 7Franca Debole, Fabrizio Sebastiani. Supervised Term Weighting for Automated Text Categorization. 2003. http ://citeseer. ist. psu. edu/572661. html ( Accessed Sep. 10,2004).
  • 8Yiming Yang, Xin Liu. A re - examination of text categorization methods. Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, 1999:42-49.
  • 9Burges C.A Tutorial on Support Vector Machines for Pattern Recognition[J].Data Mining and Knowledge Discovery,1998,284(2):121-167.
  • 10Joachims T.Text Categorization With Support Vector Machines[C].Proceedings of the European Conference on Machine Learning.Springer Verlag,1998.

共引文献538

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部