期刊文献+

文本分类中结合评估函数的TEF-WA权值调整技术 被引量:26

A Weight Adjustment Technique with Feature Weight Function Named TEF-WA in Text Categorization
下载PDF
导出
摘要 文本自动分类面临的难题之一是如何从高维的特征空间中选取对文本分类有效的特征,以适应文本分类算法并提高分类精度.针对这一问题,在分析比较特征选择和权值调整对文本分类精度和效率的影响后,提出了一种结合评估函数的TEF-WA权重调整技术,设计了一种新的权重函数,将特征评估函数蕴含到权值函数,按照特征对文本分类的辨别能力调整其在分类器中的贡献.实验结果证明了TEF-WA权值调整技术在提高分类精度和降低算法的时间复杂度方面都是有效的. Text categorization (TC) is an important research direction in Text Mining. It aims to assign one or more predefined category label(s) for a text document, and provides efficient methods for documents management and information searching. A major problem in automatic text categorization is how to select the best feature subset from the original high feature space in order to make the categorization algorithm work efficiently and improve the precision. In this paper, the methods of feature selection and weight adjustment techniques are discussed and analyzed, and their influence on text classification precision and efficiency is pointed out. Furthermore, the TEF-WA (term evaluation function-weight adjustment) is introduced. We introduce a new weight function, which includes feature weight evaluation function and adjusts the effect of the feature term in the classifier according to the feature term's strength. To evaluate the TEF-WA method, experiments are carried by using several different scale training document collection, various term evaluation functions such as document frequency, information gain, expected cross entropy, CHI, the weight of evidence for text, term frequency formula or document frequency formula. The experiment results have proved that the TEF-WA technique is efficient in promoting the classification precision and reducing the compute complexity.
出处 《计算机研究与发展》 EI CSCD 北大核心 2005年第1期47-53,共7页 Journal of Computer Research and Development
基金 国家自然科学基金重大项目(79990584)国家"九七三"重点基础研究发展规划基金项目(G1998030414)
关键词 向量空间模型(VSM) 特征选择 权重调整 特征评估函数 文本分类 vector space model feature selection weight adjustment techniques feature evaluation function text categorization
  • 相关文献

参考文献1

二级参考文献1

  • 1Yang Y,http://citeseernjneccom/yang97comparativehtml,1997年

共引文献77

同被引文献209

引证文献26

二级引证文献157

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部