摘要
本文在海量智能分词基础之上,提出了一种基于向量空间模型和TFIDF方法的中文关键词抽取算法。该算法在对文本进行自动分词后,用TFIDF方法对文献空间中的每个词进行权重计算,然后根据计算结果抽取出科技文献的关键词。通过自编软件进行的实验测试表明该算法对中文科技文献的关键词自动抽取成效显著。
On the basis of Massive Intelligent Segmentation, this paper proposes a Chinese keyword extracting algorithm based on Vector Space Model and TFIDF method. After automatic segmentation of text, this algorithm calculates the weight of every word in document space with TFIDF method and extracts the keywords of scientific and technical documents according to the calculation result. The experimental test with self-compiled software indicates the algorithm improves the efficiency of automatic keyword extraction of Chinese scientific and technical documents obviously.
出处
《情报理论与实践》
CSSCI
北大核心
2008年第2期298-302,共5页
Information Studies:Theory & Application