摘要
关键词提取技术是文本分类、文本聚类、信息检索等技术的基础,在自然语言处理领域有着非常广泛的应用。结合TFIDF关键词抽取方法的特点和中文具有的自然语言词语间相互关联的特性,提出一种基于TFIDF和词语关联度的中文关键词提取方法。该方法通过引入词语关联度,有效避免了单纯采用TFIDF算法产生的偏差。实验结果表明,该方法的平均召回率与传统方法相比得到明显提升。
Chinese Keywords extraction is one of the basic techniques for text classification, text clustering and Information Retrieval. It has been widely used in natural language processing. This paper proposes a key words extraction approach based on both TFIDF and words correlation according to the keywords extraction method of TFIDF and the correlations features of words on Chinese texts. The method can avoid the deviations of TFIDF algorithm by the introduction of word correlation. The expe^mental results show that the average recall value can be significantly improved compared with traditional methods.
出处
《情报科学》
CSSCI
北大核心
2012年第10期1542-1544,1555,共4页
Information Science