摘要
传统汉语句子相似度计算算法在处理大量专业词汇时准确率较低。为此,提出一种基于动态规划的汉语句子相似度算法。通过获取2个句子的公共子串集合,结合链表消重机制,从集合中获取2个句子的所有最长公共子串,并以此计算相似度。实验结果表明,对于含有大量专有名词的问题集合,该算法的测试正确率达93.6%,计算效率较高。
Traditional Chinese sentence computing algorithm has a lower accuracy in dealing with a large number of professional vocabulary.In order to solve this problem,this paper proposes a Chinese sentence similarity algorithm based on dynamic programming.By getting the common sub-string collection of two sentences,it combines the mechanism for duplicate elimination by linked list,and obtains all of the longest common sub-string of two sentences for computing similarity.Experimental results show that for the problem sets which contain a lot of proper nouns,the test accuracy of this algorithm is 93.6%,and has high computational efficiency.
出处
《计算机工程》
CAS
CSCD
2013年第2期220-224,共5页
Computer Engineering
基金
国家自然科学基金资助项目(61103101)
教育部人文社会科学研究基金资助项目(12YJCZH201)
关键词
句子相似度
动态规划
自动问答
最长公共子串
消重链表
sentence similarity
dynamic programming
automatic question-answer
longest common substring
duplicate elimination linked list