期刊文献+

基于数值属性的web隐藏数据抽取算法

The Hidden Web Data Extraction Algorithm Based on Numerical Attributes
下载PDF
导出
摘要 用户通过web查询接口获取后台数据库的数据时,由于返回结果元组数量是受限的,只能获取隐藏数据库中的部分数据.现有的搜索引擎技术也很难有效的爬取隐藏数据库的全部数据.为此,针对后台隐藏数据库的数值属性类型,本文提出了基于数值属性的排序划分算法,通过该算法能够以较少的次数查询获取隐藏数据库数据的全部数据元组,并给出了算法查询代价的理论分析,通过实验验证了算法的有效性. When the user obtains the data of the background database through the web query interface, the number of the returned result is limited, and only partial data of the hidden database is acquired. The existing search engine technology is also difficult to effectively crawl all the data in the hidden database. To this end, a sorting algorithm based on numerical attributes is proposed for type of the numerical attributes of the background hidden database. By this algorithm, the total data tuples of the hidden database can be acquired with less query time. The theoretical analysis of the query cost of the algorithm is given, and the validity of the algorithm is verified by experiments.
作者 孙阳 李贵 韩子扬 李征宇 孙平 SUN Yang;LI Gui;HAN Zi-yang;LI Zheng-yu;SUN Ping(Faculty of Information & Control Engineering,Shenyang Jianzhu University,Shenyang 110168,China)
出处 《信息工程期刊(中英文版)》 2016年第1期1-8,共8页 Scientific Journal of Information Engineering
关键词 隐藏数据库 数值属性 二元划分算法 排序划分算法 Hidden Database Numerical Attribute Binary-shrink Rank-shrink
  • 相关文献

参考文献3

二级参考文献31

  • 1王茹,宋瀚涛,陆玉昌.基于树自动机的网页数据抽取[J].北京理工大学学报,2004,24(9):790-793. 被引量:6
  • 2胡东东,孟小峰.一种基于树结构的Web数据自动抽取方法[J].计算机研究与发展,2004,41(10):1607-1613. 被引量:21
  • 3Chang KCC,He B,Li C,et al.Structured databases on the Web:Observations and implications[J].SIGMOD Record,2004,33(3):61-70.
  • 4Calife M,Mooney R.Relational learning of pattern match rules for information extraction[C] //Proc of the 16th National Conf on Artificial Intelligence and 11th Conf on Innovative Applications of Artificial Intelligence.Menlo Park,CA:AAAI,1999:328-334.
  • 5Soderlan S.Learning information extraction rules for semi-structured and free text[J].International Journal of Machine Learning,1999,34(1-3):233-272.
  • 6Muslea I,Minton S,Knoblock G.A hierarchical approach to wrapper induction[C] //Proc of the 3rd Conf on Autonomous Agents.New York:ACM,1999:190-197.
  • 7Liu Wei,Meng Xiaofeng,Meng Weiyi.Vision-based Web data records extraction[C] //Proc of the 9th SIGMOD Int Workshop on Web and Database.New York:ACM,2006:20-25.
  • 8Zhao Hongkun,Meng Weiyi.Fully automatic wrapper generation for search engines[C] //Proc of WWW'05.New York:ACM,2005:66-75.
  • 9Liu L,Pu C,Han W.XWRAP:An XML-enable wrapper construction system Web information sources[C] //Proc of the 16th IEEE Int Conf on Data Engineering.Washington:IEEE,2000:611-621.
  • 10Valter C,Giansalvatore M,Paolo M.RoadRunner:Towards automatic data extraction from large Web sites[C] //Proc of the 27th VLDB.San Francisco:Morgan Kaufmann,2001:109-118.

共引文献67

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部