摘要
长句是中文书面语的常见现象,其由于结构复杂在计算句子相似度时难度较大。综合考虑依存关系中的关键元素,对中文依存句法树进行研究和分析,提出了一种细粒度依存关系的相似度计算方法。通过研究依存句法树中的各节点的词语、词性以及它们之间的依赖关系及其重要性权重等多个特征量,给出了两个依存句法树的相似度计算方法;基于该算法实现中文长句的相似度计算。实验结果表明该方法用于计算中文长句相比较其他算法有更高的准确率。
Long sentence is a common phenomenon in Chinese written material. It is difficult to calculate the sentence similarity because of its complex structure. Multi-feature fusion method is proposed to research and analysis the Chinese dependency syntax tree. A similarity computing method for dependency syntactic tree is introduced. Based on the dependency syntactic tree structure, the node words, parts of speech, and the dependencies between words are considered. The similarity calculation method between two dependency syntactic trees is proposed through comprehensive analysis of feature weights of dependency relation. And a similarity calculation for Chinese long sentences is realized based on the method. Experimental results show that this method achieved a higher accuracy rate comparing with other method.
出处
《科学技术与工程》
北大核心
2017年第11期277-281,共5页
Science Technology and Engineering
基金
国家自然科学基金(U1504612)
河南省高校创新人才计划(15HASTIT023)
河南省高校重点科研项目(17A520002)资助