期刊文献+

基于本体实例信息的深度网表单属性自动抽取

Ontology Instance-based Attributes Extracting for Deep Web
下载PDF
导出
摘要 Deep Web是隐藏在Surface Web之后的信息提供者,而且在Deep Web之中还隐藏着更大量的信息.目前,对Deep Web中的信息进行有效的获取的可行方法是通过Deep Web提供的查询接口对其进行访问.自动抽取查询接口中的属性并生成正确的查询条件是提升访问Deep Web能力的有效方法.查询接口中属性之间存在着不同的语义约束关系,如互斥和共存.为了生成有效的查询条件,必须发现并协调关键属性间的语义关系.为了解决些问题,提出一个基于本体技术并充分利用实例信息的表单属性自动抽取方法,在这一方法中使用WordNet来丰富抽取出的关键属性并发现表单中属性间的语义关系.在属性抽取过程中,每个属性被拓展生成一个备选属性集并且以树型数据结构存储,而且备选属性树可以有效的描述属性间的语义关系.在现实领域中的试验证明,这一框架结构可以自动的抽取Deep Web表单属性并有效的生成查询条件. The Deep Web is behi-nd the Surface Web and more information is hidden in it. The search engines and the web crawlers can not access the Deep Web directly. The only and workable way to access the hidden database is through query interface. Automatic extracting attributes from the query interface and translating a query is a solvable way for addressing the current limitations in accessing Deep Web data sources. The query interface provides semantic constraints, some attributes are co-occurred and the others are exclusive sometimes. To generate a valid query, we have to reconcile the key attributes and semantic relation between them. We design a framework to automatically extract the attributes from the query interface taking full advantage of instance information and use the WordNet as a kind of ontology technique to enrich the attributes embedded in the semantic query interface. Each attribute is extended into a candidate attribute set in the form of a hierarchy tree. We carry out our experiments in the real-world domain. The results of the experiments showed the validation of query translation framework.
出处 《小型微型计算机系统》 CSCD 北大核心 2009年第5期883-886,共4页 Journal of Chinese Computer Systems
基金 自然科学基金项目(60373099 60873235)资助 国家教育部高等学校博士学科点专项科研基金项目(200801830021)资助 吉林省科技发展基金项目(20070533 20080318)资助 新世纪高校杰出青年基金项目(NCET-06-0300)资助
关键词 DEEP WEB Surface WEB 查询接口 WORDNET 本体 deep web surface web query interface wordNet ontology
  • 相关文献

参考文献9

  • 1He Bin,Kevin Chen-chuan Chang.Statistical schema matching across web query interfaces[C].Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data,San Diego,California,USA,2003,217-228.
  • 2He Hai,Meng Wei-yi,Clement T Yu,Wu Zong-huan.Wise-integrator:an automatic integrator of web search interfaces for e-commerce[C].Proceedings of 29th International Conference on Very Large Data Bases,Berlin,Germany.Morgan Kaufmann,2003,357-368.
  • 3Wu Wen-sheng,Clement T Yu,AnHai Doan,Meng Wei-yi.An interactive clustering-based approach to integrating source query interfaces on the deep web[C].Proceedings of ACM SIGMOD International Conference on Management of Data,France,2004,95-106.
  • 4Chang K C C,He B,Li C,et al.Structured databases on the web:observations and implications[J].ACM SIGMOD Record,2004,33(3):61-70.
  • 5He Bin,Kevin Chen-Chuan Chang.Making holistic schema matching robust:an ensemble approach[C].Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Chicago,Illinois,USA,2005,429-438.
  • 6Wang Ji-ying,Wen Ji-rong,Fred Lochovsky,et al.Instance-based schema matching for web databases by domain-specific query probing[C].Proceedings of the Thirtieth International Conference on Very Large Data Bases,Toronto,Canada,2004,408-419.
  • 7Yoo Jung An,James Geller,Wu Yi-ta,et al.Semantic deep web:automatic attribute extraction from the deep web data sources[C].Proceedings of the 2007 ACM Symposium on Applied Computing,Seoul,Korea,March,2007,1667-1672.
  • 8The UIUC web integration repository[EB/OL].http://metaquerier.cs.uiuc.edu,2007.
  • 9Kang J,Naughton J F.On schema matching with opaque column names and data values[C].Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data,San Diego,California,USA,2003,205-216.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部