摘要
利用《知网》计算词语的语义相似度,通过提取关键词进行文本相似度计算。将文本分词并过滤停用词后,结合词语的词性、词频和段频计算词语的权值,以便提取文本的关键词,通过计算关键词之间的相似度来计算文本之间的相似度值。实验结果与对比值进行差异显著性分析表明,本文提出的方法相比传统的语义算法和向量空间模型算法,其精确性有了进一步的提高。
This paper computes the semantic similarity of words using the How Net and extracting the text keywords to compute the similarity of the texts. After segmenting the text and filtering stop words,it calculates the weights of word to extract the key words of the text by combining the gender,word frequency and paragraph frequency of the word. By calculating the similarity of the keywords,the similarity value of the texts is calculated. The analysis of the significant difference of the experimental results shows that its accuracy is further improved compared with the traditional semantic algorithm and vector space model algorithm.
出处
《计算机与现代化》
2015年第4期6-9,共4页
Computer and Modernization
基金
湖南省自然科学基金资助项目(12JJ3066)
湖南省高校科技成果产业化培育项目(11CY018)
湖南省"十二五"重点学科项目
关键词
文本相似度
语义
《知网》
关键词
段频
text similarity
semantic
HowNet
keywords
paragraph frequency