摘要
已有的文本相似度计算方法处理长短语时只考虑比较其核心词部分,但核心词的修饰词也会对语义造成一定影响,导致文本相似度计算结果不够准确。为此提出基于多谓词语义框架的长短语文本相似度计算,将文本内容进行多谓词语义框架填充,利用依存句法分析法将长短语构建成短语树,采用层次分析法确定树层权值,结合不同层次的结点相似度得出长短语相似度。对句子、短篇和长篇文本相似度计算的实验分析结果表明,该方法达到了较高准确性,且准确性随文本数量增加而提高。
The existing methods for calculating the similarity of text only consider the key words in the phrase when dealing with long phrase forms.However,the modifiers of the core words also have some influence on the semantics,leading to the lack of comprehensive consideration on the aspect of text semantics.A method of long phrase text similarity calculation method based on multi predicate semantic frame was then proposed.The text content was filled with multi predicate semantic frame,when dealing with the phrase similarity,the dependency parsing method was used to construct the phrase into a tree,and the analytic hierarchy process was used to determine the weight of the tree layer.Combining the similarity of nodes at different levels,the final similarity value was got.The analysis of sentence,short and long text similarity calculation test show the proposed method achieves high accuracy,and the accuracy increases with the increase in the number of text.
作者
王景中
杨彬彬
何云华
WANG Jing-zhong;YANG Bin-bin;HE Yun-hua(College of Computer,North China University of Technology,Beijing 100144,China)
出处
《计算机工程与设计》
北大核心
2018年第4期1022-1028,1052,共8页
Computer Engineering and Design
基金
北京市教委科技创新服务能力建设基金项目(pxm2017-014212-000002)
关键词
文本相似度
语义框架
多谓词
依存句法分析
层次分析
text similarity
semantic frame
multiple predicates
dependency parsing
hierarchical analysis