期刊文献+

文本粗分类数据中噪声的快速修正算法

A Fast Noise Revision Algorithm in Text Categorization
下载PDF
导出
摘要 训练数据中的噪声数据对文本分类结果的精度会造成不良影响,本文提出了一种对噪声数据进行修正的快速算法。针对以前的算法,每次迭代只对一个文档进行修正,迭代次数与噪声数据数量相当,算法运行效率较低的问题,本文通过分析调整文档所属类别对评价指标的影响,提出依据模块度变化量判断噪声数据,一次迭代过程中可以对多个文档进行修正处理,从而提高算法效率。实验结果表明,本文所提算法能够更快地修正粗分类数据中的噪声,算法复杂度从以前算法的O(Tnm^2)降低为O(Tnm)。该算法可以用于对大数据量数据进行处理,实用价值更高。 The noisy texts in training data will influence the performance of the categorization system. This paper proposes a fast revision algorithm for revising the noisy texts. The previous algorithm, NNRA, revise only one document at one iteration step. The iteration times equal to the number of noisy texts in document set approximately. This paper aims to improve the efficiency of the algorithm, especially the convergent speed. By analyzing the influence of the modularity changing the categorization of the documents, we put forward an improved algorithm that using the changes of modularity to identify the noisy texts. The improved algorithm can revise several noisy texts at one iteration step. The experimental results indicate that the proposed algorithm can increase the convergent speed obviously. The computing complexity is O(Tnm) instead of O( Tnm^2) . This algorithm can be used in the applications with large amount of documents.
出处 《情报学报》 CSSCI 北大核心 2009年第5期700-705,共6页 Journal of the China Society for Scientific and Technical Information
基金 国家自然科学基金资助项目(70431001,70620140115,70771019).
关键词 文本分类 集团结构 模块度优化 噪声数据 text categorization, community structure, modularity optimization, noisy text
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部