摘要
在对传统算法Road Runner研究基础上,本文提出一种基于树型结构的包装器生成算法。在对训练样本进行匹配过程中,引入树型结构进行比较,算法运算效率明显提升,对迭代项和可选项的识别也更加精准。
Based on the research of the traditional Road Runner algorithm, this paper proposes a wrapper generation algorithm based on tree structure? In the process of matching the training samples, the tree structure is introduced? The efficiency of the algorithm is improved obviously, and the recognition of iterative terms and options is more accurate?
出处
《电子测试》
2017年第12X期135-136,共2页
Electronic Test
基金
2018年沈阳城市建设学院科研发展基金立项项目"面向深网网页的信息抽取算法研究"(XKJ2018006)
关键词
WEB信息抽取
包装器
树型结构
Web information extraction
wrapper
tree structure