期刊文献+

基于DOM的Deep Web查询接口属性抽取方法

Attributes extraction of Deep Web query interface based on DOM
下载PDF
导出
摘要 属性抽取主要基于单元素属性抽取,而多元素属性抽取的研究较少。针对多元素组成属性情况进行研究,提出一种基于查询接口DOM结构的属性抽取方法,该方法将查询接口解析成DOM,基于DOM节点提取查询接口上对应的表单元素,对从查询接口上提取获得的元素集合进行二次聚类,挖掘元素之间的组合关系,最终将元素组合形成属性。这种方法能很好地抽取接口上的单元素属性和多元素属性,实验结果表明了方法的有效性。 Query interface schema extraction is the precondition of Deep Web data integration. Generally query inter /ace schema consists of a set of domain-related attributes, and one attribute is formed by a single element or a com bination of multi-elements. The current researches on attribute extraction are mostly based on the single element fashion, and those multi-elements based are few. Aiming at the case of multi-elements attribute extraction, a DOM- based method for query interface schema extraction is proposed. This method parses query interface to become a DOM and extracts the form elements base on the corresponding DOM nodes. The method employs two-phase clus- tering algorithms to cluster the form elements, mines the combination relationship of them and combines elements to realize attributes extraction. This method has a favorable performance at both single-element and multi-elements attribute extraction. The experimental result shows that this method is effective.
出处 《桂林电子科技大学学报》 2012年第6期468-472,共5页 Journal of Guilin University of Electronic Technology
基金 国家自然科学基金(61163057)
关键词 属性抽取 DEEP Web 查询接口 DOM节点 表单元素 attributes extraction Deep Web query interface DOM node form element
  • 相关文献

参考文献10

  • 1Ghanem T M,Aref W G. Database deepen the web[J]. IEEE Computer,2004,73(1) :116-117.
  • 2Chang K C, He B,Li C,et aL Structured database on the web:Observations and implications[J]. SIGMOD Re- cord,2004,33(3) :61-70.
  • 3刘伟,孟小峰,孟卫一.Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489. 被引量:136
  • 4Zhang Z, He B, Chang K C. Understanding web query interraces:Besteffort parsing with hidden syntax[C]// Proceedings of the 23rd ACM SIGMOD International Conference on Management of Data. Paris: [ s. n. ], 2004 : 107-118.
  • 5梁浩,左万利,任斐,等.基于启发式信息的Deep Web查询接口属性抽取[J].计算机研究与发展,2009,46:48-54.
  • 6王鸿,余建桥.基于N-Gram的Deep Web接口属性抽取[J].计算机与现代化,2010(12):135-138. 被引量:1
  • 7Wang Ying,Peng Tao,Zuo Wanli,et al. Schema extrac- tion of deep web query inter[aceEC]//International Con- [erence on Web Information Systems and Mining. Shanghai, China: [s. n.],2009 : 71-75.
  • 8Liang Hao,Ren Fei, Zuo Wanli, et al. Extracting attrib- utes from deep web interface using instances [C]// World Congress on Computer Scienee and Information Engineering. Los Angeles/Anaheim, USA[ s. n. ], 2009:112-I18.
  • 9张慧兵.DeepWeb查询接口及查询结果抽取研究[D].天津:南开大学,2010.
  • 10谭子玉.针对DeepWeb数据库查询接口模式抽取[D].长春:吉林大学,2009.

二级参考文献64

  • 1LIU Wei,LI Xian,LING Yanyan,ZHANG Xiaoyu,MENG Xiaofeng.A Deep Web Data Integration System for Job Search[J].Wuhan University Journal of Natural Sciences,2006,11(5):1197-1201. 被引量:6
  • 2高岭,赵朋朋,崔志明.Deep Web查询接口的自动判定[J].计算机技术与发展,2007,17(5):148-151. 被引量:13
  • 3.[EB/OL].http://www.cogsci.Princeton.edu,.
  • 4刘伟,孟小峰,孟卫一.Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489. 被引量:136
  • 5Fetterly D,Manasse M,Najork M,Wiener J L.A largescale study of the evolution of Web pages//Proceedings of the 12th International World Wide Web Conference.Budapest,2003:669-678
  • 6Chang K C,He B,Li C,Patel M,Zhang Z.Structured databases on the Web:Observations and Implications.SIGMOD Record,2004,33(3):61-70
  • 7Cope J,Craswell N,Hawking D.Automated discovery of search interfaces on the Web//Proceedings of the 14th Australasian Database Conference(ADC 2003).Adelaide,2003:181-189
  • 8Zhang Z,He B,Chang K C.Understanding Web query interfaces:Best-effort parsing with hidden syntax//Proceedings of the 23rd ACM SIGMOD International Conference on Management of Data.Paris,2004:107-118
  • 9Arasu A,Garcia-Molina H.Extracting structured data from Web pages//Proceedings of the 22nd ACM SIGMOD International Conference on Management of Data.San Diego,2003:337-348
  • 10Crescenzi V,Mecca G,Merialdo P.RoadRunner:Towards automatic data extraction from large Web sites//Proceedings of the 27th International Conference on Very Large Data Bases.Italy,2001:109-118

共引文献135

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部