期刊文献+

面向段落逻辑标签判断的VSM改进算法 被引量:2

Improved VSM algorithm for judging paragraph logic label
下载PDF
导出
摘要 针对基于VSM的文档排版格式检查算法中,段落无法同时与多个逻辑标签对比,段落逻辑标签判断正确率召回率较低的问题。在原VSM算法的基础之上,根据模糊模式识别中的隶属度原理为定性分量的量化设计了隶属度表,分析去量纲化后各分量的变化范围和差异程度的变化,找到适用于格式向量的去量纲化方法,分析去量纲化方法和相似度度量方法结合之后的逻辑标签判断效果,找到与去量纲化方法匹配的相似度度量方法。实验表明,较以往算法,改进算法可以将段落与任意逻辑标签对比,减少去量纲化与相似度度量中信息的丢失,有效提高逻辑标签判断的准确率及召回率,适合于含有多种类型变量的向量相似度问题的求解。 In document format layout checking algorithm, it is impossible to compare a paragraph with many logical labels at same time to find the most similar logical label. Based on the prior VSM algorithm, membership tables for qualitative component quantization are designed based on the membership theory in fuzzy pattern recognition. The variation range and divergence change after standardization are analyzed. The standardization methods which suit the format vector is found. The standardization methods and similarity measurements are combined to analyze their outcomes, finding the suitable similarity measurements for standardization methods. Experiments show that, compared with prior algorithm, the improved algorithm can compare paragraph with any logieal labels, decrease the information loss in standardization and similarity computation process. It significantly improves the logieal label judging precision and recall rate, which is a suitable solution to vector similarity computation problems with many different kinds of components in one vector.
作者 彭欣 李宁
出处 《北京信息科技大学学报(自然科学版)》 2014年第6期19-24,共6页 Journal of Beijing Information Science and Technology University
基金 北京市属高等学校创新团队建设与教师职业发展计划项目(IDHT20130519)
关键词 文档排版格式检查 向量空间模型 去量纲化 向量相似度度量 文档理解 document layout checking Vector Space Model standardization [ V1 ] vector similaritymeasurement document understanding
  • 相关文献

参考文献8

二级参考文献53

共引文献28

同被引文献15

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部