摘要
特征项权重计算是文本挖掘中关键词提取的核心,其计算方法的好坏对文本挖掘的结果有着重要的影响。本文在对关键词提取特征项权重计算的传统TFIDF算法分析的基础上,为减少该算法特征权重计算时对词频的过于依赖,提出一种基于同义替换和相邻词合并(KSRAM)的特征权重计算方法。为检验算法性能,对KSRAM算法和传统TFIDF算法进行了关键词提取对比实验,实验表明KSRAM算法在关键词提取的准确率和召回率方面比传统TFIDF算法有明显的提高。
Feature item weighting is the core of the keywords extraction in text mining.The calculation approaches has an impor-tant impact to the result of text mining.This paper analyzes the shortage of the traditional TFIDF algorithm,and then proposes thenewapproaches of keyword feature item weighting based on synonymy replace and adjacent merge to reduce the over-reliance onword frequency when calculating the weight.In order to test algorithm performance,this paper does the keyword extraction com-parative experiment between KSRAMalgorithm and the traditional TFIDF algorithm,the result shows that the KSRAMalgorithm isbetter than TFIDF algorithm in precision and recall.
出处
《计算机与现代化》
2010年第4期115-117,121,共4页
Computer and Modernization