期刊文献+

基于词法、句法和语义的句子相似度计算方法 被引量:4

Sentence similarity calculation method based on lexical, syntactic and semantic
下载PDF
导出
摘要 为了解决现有句子相似度算法未考虑句子语义信息的问题,提出了一种基于词法、句法和语义的句子相似度计算方法.将句子相似度分为词法层、句法层、语义层3个层次.在词法层,通过构建句子的词汇相似度矩阵和数字序列相似度矩阵来计算词法相似度;在句法层,使用概念词汇转化成的RDF三元组相似度来计算句法相似度;在语义层,基于本体树状结构中最短路径表示的语义距离来计算语义相似度.然后,提出句子语义相似度计算模型,采集图书领域句子对作为测试集,构建图书领域本体作为知识源.实验结果表明,所提方法具有更高的准确率和召回率,其F-度量值达0.6499,与余弦相似度算法、基于编辑距离的算法和基于TF-IDF的算法相比分别提高约12%、17%和16%. To solve the problem that the existing sentence similarity algorithms did not consider semantic information,a similarity computation method based on lexical,syntactic and semantic was proposed.The sentence similarities were divided into three levels,including the lexical layer,the syntactic layer and the semantic layer.In the lexical layer,the lexical similarity matrix and the digital sequence similarity matrix were constructed to calculate the similarity of the sentence.In the syntactic layer,the similarity of the sentence was calculated by the similarity of the resource description framework(RDF)triples converted from conceptual vocabularies.In the semantic layer,the semantic distance based on the shortest path representation in the ontology structure was used to calculate the semantic similarity.Then,the semantic similarity calculation model of sentences was proposed.The sentence pairs in the book domain were collected as the test sets,and the book ontology was constructed as the knowledge source.Experimental results show that the proposed method has higher accuracy and recall rate,and its F-measure reaches 0.6499.Compared with the cosine similarity algorithm,the Levenshtein algorithm and the TF-IDF(term frequency-inverse document frequency)algorithm,the F-measures are increased by about 12%,17%and 16%,respectively.
作者 翟社平 李兆兆 段宏宇 李婧 董迪迪 Zhai Sheping;Li Zhaozhao;Duan Hongyu;Li Jing;Dong Didi(School of Computer Science and Technology,Xi'an University of Posts and Telecommunications,Xi'an 710121,China;Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing,Xi'an University of Posts and Telecommunications,Xi'an 710121,China)
出处 《东南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2019年第6期1094-1100,共7页 Journal of Southeast University:Natural Science Edition
基金 陕西省社会科学基金资助项目(2016N008) 西安市社会科学规划基金资助项目(17X63) 陕西省自然科学基金资助项目(2012JM8044) 陕西省教育厅科学研究计划资助项目(12JK0733)
关键词 句子相似度 词法层 句法层 语义层 本体 sentence similarity lexical layer syntactic layer semantic layer ontology
  • 相关文献

参考文献6

二级参考文献112

共引文献100

同被引文献36

引证文献4

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部