摘要
在基于Web数据集成的3点研究假设的前提下,探讨了基于规则树的包装器(Wrapper)生成模型。它包括预处理、生成HTML树、生成模式树、获取映射规则、生成规则树、修复规则树和执行Wrapper。详细介绍了该系统中映射规则的实现和规则树生成算法。通过实验测试,证明该方法适合Web数据的抽取。
Based on three resesreh hypotheses on Web data integration, the paper presents the generation process of rule tree, which contains functions like preprocessing, producing HTML tree. producing modal tree, acquiring mapping rules, producing rule tree,maintaining rule tree and implementing Wrapper. This paper also presents in detail about the implementation of mapping rules and the algorithm for generating rule tree. It has been verified by test data that this method is fit for Web data extraction.
出处
《计算机技术与发展》
2006年第6期242-244,共3页
Computer Technology and Development