摘要
论文对几种传统的、具有代表性的文档相似度的计算方法进行了综述,并分析了各自的应用局限性。针对结构化描述的科技论文的特点,提出一种能综合文档特征信息、上下文领域知识和引用关系的新相似度计算算法,并通过原型系统讨论其有效性。
The paper reviews several traditional and typical similarity measures between documents.After analyzing their existing limitations,we propose a new similarity algorithm which can synthesize documents characteristic information, context domain knowledge and citation relation according to structural description of science and technology papers. Finally we discuss its availability using prototype system.
出处
《计算机工程与应用》
CSCD
北大核心
2006年第30期160-163,共4页
Computer Engineering and Applications
关键词
对象相似性
引文图
结构上下文相似性
层次域结构
objects similarity,citation graph,structural context similarity,hierarchy domain structure