摘要
通过分析χ2统计量(Chi-square,CHI)的缺陷和不足,针对它对低文档频的特征项不可靠,而且不能说明词条和类别的相关性的缺点,对其进行改进,提出统计频率(Statistical Frequency,SF)算法。实验结果表明,统计频率算法能够弥补这些不足,在文本分类中表现出良好的分类效果。
This paper analyzes Chi -square algorithm (CHI) , which is unreliable for low- document frequency, and can't show the pertinence for term and classification. A new Statistical Frequency algorithm (SF) is proposed according to the chief shortcomings. The experiments of the SF algorithm is validated by comparison, the results show that improved algorithm performs better.
出处
《现代图书情报技术》
CSSCI
北大核心
2008年第11期44-48,共5页
New Technology of Library and Information Service
基金
江苏省教育厅高校哲学社会科学基金项目"江苏高校数字图书馆引进资源的绩效评价与发展战略研究"(项目编号:08SJB8700004)的研究成果之一