摘要
目的针对常用的向量空间模型忽视了文本中的词序和结构信息,影响文本相似度计算的准确度的缺点,提出新的文本案例相似度计算方法。方法将文本表示粒度由词提高到句子,加入词序信息。结果提出了句子向量空间模型及基于该模型的文本案例相似度计算方法。结论这种方法更符合人类理解的模式,提高了文本案例相似度计算的准确度。
Aim The traditional algorithm based on vector space model actually neglects the word order and structure in sentences,which will affect the accuracy of similarity computing.So this paper proposed a new textual case similarity algorithm.Methods The sentence,rather than the word,was used as the unit and the word order information was considered,sentence vector space model was proposed,which is the base of textual case similarity algorithm.Results The method is more consistent with the mode of human understanding and improves the accuracy of textual case similarity compatation.Conclusion The application in textual case classification proves that the method is feasible.
出处
《西北大学学报(自然科学版)》
CAS
CSCD
北大核心
2010年第6期991-994,共4页
Journal of Northwest University(Natural Science Edition)
基金
西北大学科研启动基金资助项目(PR08067)
西北大学研究生自主创新基金资助项目(08YZZ35)
关键词
句子向量空间模型
词序
相似度
文本案例分类
满意度
sentence vector space model
word order
similarity
textual case classification
satisfaction degree