摘要
在特征词提取算法中,TF-IDF算法是最常见的特征权重计算方法。在传统TF-IDF算法的基础上,提出新的基于文本词语长度的关键词提取算法。利用中文短语分词技术,识别文本中的长词与普通词汇,对于不同长度的词语利用提出的TF-IDF-WL方法重新计算权重,按权值排序结果得到关键词。实验对比发现,新的特征词提取算法能够更加精确地反映出特征词的词长情况,该算法与传统的TF-IDF算法相比,在准确率和召回率上都有较大的提升。
In the text feature word extraction algorithm,TF-IDF algorithm is the most common feature weight calculation method.On the basis of the traditional TF-IDF extract algorithm, a new keyword extraction algorithm based on the text word length is proposed.Using chinese phrase word segmentation technique to identify long words and ordinary words in text,the proposed TF-IDF-WL method is used to recompute weights for different lengths of words, and the keywords are sorted by weights.Experimental results show that the new feature word extraction algorithm can more accurately reflect the lexical length of the feature words.Compared with the traditional TF-IDF algorithm, the algorithm has greatly improved accuracy and recall rate.
出处
《辽宁石油化工大学学报》
CAS
2017年第4期61-64,69,共5页
Journal of Liaoning Petrochemical University
基金
辽宁省教育科学"十三五"规划课题资助项目(JG16DB253)
辽宁石油化工大学教育教学改革研究项目(20165230060003)