摘要
TF-IDF是文本特征赋权的常用方法.该方法简单易行,但没有考虑位置因素对特征赋权的影响.通过修改因子,分析不同条件下文本表现形式的差异,提出3个基于位置的文本特征加权方法.随后的文本分类试验表明,此加权模型相比较于传统的方法,均具有较好的文本标注效果.
TF-IDF is a kind of common methods used to measure the terms in a document.This method is easy but it considers no factor of the position.By modifying the TF-IDF with the position information and analyzing the difference of texts form under the different situation,we put forward three means based on positions to weight the terms.We have a test about text categorization and the result shows that these methods have a better precision than the common TF-IDF.
出处
《微电子学与计算机》
CSCD
北大核心
2009年第2期188-192,共5页
Microelectronics & Computer
基金
国家自然科学基金项目(70571087)
关键词
特征加权
位置加权
改进
文本分类
feature weighting
position weighting
text classification
modified TF-IDF