摘要
分析了中文自然语言处理中句子相似度的计算方法,介绍了基于向量空间模型的TF-IDF的、基于句子语义和基于句子依存关系的三种句子相似度计算模型,并对它们的计算原理、计算方法进行了分析,给出了他们的优缺点.基于向量空间模型的句子相似度计算模型已经比较成熟,一般情况下能够产生较好的效果.由于TF-IDF方法没有考虑这种语义信息,所以传统的TF-IDF方法具有一定的局限性.而基于句子语义或句子的依存结构来进行相似度计算,能达到更好的效果.
Chinese sentence similarity computing method is given in this paper, and three sentence similarity computing models including SVM--based TF--ID, sentence semantic model and sentence dependency relationship as well as their theories, computing methods, advantage and disadvantage are introduced. For ignoring the semantic information, the traditional TF--IDF model has some limitation, so sentence similarity computing method based on semantic or dependency relationship has better effect.
出处
《兰州工业高等专科学校学报》
2009年第4期1-3,24,共4页
Journal of Lanzhou Higher Polytechnical College
关键词
TF—IDF
语义
依存结构
相似度计算
模型
TF-- IDF
semantic
dependency relationship
similarity computing
model