摘要
全面准确地标注Deep Web查询结果是Deep Web数据集成的关键问题,但现有的Web数据库标注方法还不能较好地解决该问题,为此提出一种基于结果模式的Deep Web数据标注方法。首先通过结果页面解析和抽取结构化数据来完成数据预处理的工作,并在集成结果模式和待标注数据之间建立正确的语义映射,进而确定DeepWeb数据的标注信息。通过对4个领域Web数据库进行实验测试,结果表明所提方法能有效地标注Deep Web查询结果数据。
Comprehensive and accurate annotation of Deep Web data is the key technology to Deep Web data integration,but the existing methods of Deep Web data annotation are unavailable to effectively solve the problem.Therefore,an approach of Deep Web data annotation based on result schema was proposed.The paper,through analyzing Deep Web result pages and extracting structured data,completed data pretreatment work,then though establishing the correct semantic mapping relation between integrated result schema and staying annotation data,achieved correct annotation of Deep Web data.The experimental results over four real areas show that the proposed method can efficiently annotate Deep Web data.
出处
《计算机应用》
CSCD
北大核心
2011年第7期1733-1736,共4页
journal of Computer Applications
基金
甘肃省自然科学基金资助项目(0809RJZA018)