摘要
文本分类中特征项权重的赋予对于分类效果有较大的影响,TFIDF算法是权重计算的重要算法之一。在回顾TFIDF算法发展历史的基础上,考察了其固有缺陷,总结诸多学者对其的改进方法,并对TFIDF算法新的应用领域进行了概括,并通过实验验证相关改进算法,为读者更好地应用TFIDF算法提供参考。
In text categorization, the weight of term has great impact on the classification results. Term Frequency and Inverse Documentation Frequency (TFIDF) is one of the key algorithms of term weighting. This paper reviewed the development of the TFIDF algorithm, studied its inherent defects, and summarized some scholars' improvements to it. Meanwhile, the survey generalized its new application fields. To verify their effects on the classification results, the author carried out some experiments on the ameliorative algorithms, hoping to provide some reference to readers.
出处
《计算机应用》
CSCD
北大核心
2009年第B06期167-170,180,共5页
journal of Computer Applications
关键词
TFIDF
文本分类
VSM
Term Frequency and Inverse Documentation Frequency (TFIDF)
text categorization
VSM