期刊文献+

一种Deep Web查询结果的实体抽取方法 被引量:4

Research on entity extraction method of Deep Web data integration
下载PDF
导出
摘要 Deep Web中蕴含着丰富的高质量的信息,通过Deep Web集成查询接口可以获取到包含这些信息的结果页面,因此,Deep Web查询结果页面的数据抽取成为Deep Web数据集成的关键。提出了将索引方法和编辑相似度相结合的方法,来完成Deep Web查询结果页面的数据抽取工作。大量实验结果表明:该方法是可行的,并且能够提高Deep Web数据实体抽取的准确性和召回率。 Based on the realization of Deep Web integrated query mechanism, Deep Web information can be obtained from the resulting pages, so how to extract the entity information of Deep Web from the results pages effectively becomes the key of Deep Web data integration. A method that combines the index with the edit similarity methods is proposed, which resolves the problem of data extraction of Deep Web result page. Large experimental results show that this approach is feasible, and can improve the precision and recall of Deep Web data extraction.
出处 《计算机工程与应用》 CSCD 2012年第36期160-163,共4页 Computer Engineering and Applications
基金 国家自然科学基金(No.70671035)
关键词 深度网 数据抽取 文件对象模型(DOM)树 索引 相似度 Deep Web data extraction Document Object Model(DOM) tree index similarity
  • 相关文献

参考文献6

二级参考文献33

  • 1王茹,宋瀚涛,陆玉昌.基于树自动机的网页数据抽取[J].北京理工大学学报,2004,24(9):790-793. 被引量:6
  • 2胡东东,孟小峰.一种基于树结构的Web数据自动抽取方法[J].计算机研究与发展,2004,41(10):1607-1613. 被引量:21
  • 3Chang KCC,He B,Li C,et al.Structured databases on the Web:Observations and implications[J].SIGMOD Record,2004,33(3):61-70.
  • 4Calife M,Mooney R.Relational learning of pattern match rules for information extraction[C] //Proc of the 16th National Conf on Artificial Intelligence and 11th Conf on Innovative Applications of Artificial Intelligence.Menlo Park,CA:AAAI,1999:328-334.
  • 5Soderlan S.Learning information extraction rules for semi-structured and free text[J].International Journal of Machine Learning,1999,34(1-3):233-272.
  • 6Muslea I,Minton S,Knoblock G.A hierarchical approach to wrapper induction[C] //Proc of the 3rd Conf on Autonomous Agents.New York:ACM,1999:190-197.
  • 7Liu Wei,Meng Xiaofeng,Meng Weiyi.Vision-based Web data records extraction[C] //Proc of the 9th SIGMOD Int Workshop on Web and Database.New York:ACM,2006:20-25.
  • 8Zhao Hongkun,Meng Weiyi.Fully automatic wrapper generation for search engines[C] //Proc of WWW'05.New York:ACM,2005:66-75.
  • 9Liu L,Pu C,Han W.XWRAP:An XML-enable wrapper construction system Web information sources[C] //Proc of the 16th IEEE Int Conf on Data Engineering.Washington:IEEE,2000:611-621.
  • 10Valter C,Giansalvatore M,Paolo M.RoadRunner:Towards automatic data extraction from large Web sites[C] //Proc of the 27th VLDB.San Francisco:Morgan Kaufmann,2001:109-118.

共引文献26

同被引文献24

  • 1顾铮,顾平.信息抽取技术在中医研究中的应用[J].医学信息(西安上半月),2007,20(1):27-30. 被引量:11
  • 2齐振宇,赵军,杨帆.一种开放式中文命名实体识别的新方法[c]∥第五届全国信息检索学术会议论文集,2009:60-69.
  • 3WebHarvest [EB/OL]. [2009-12-25]. http//web-harvest source-forge.net.
  • 4NLPCN. Ansj [EB/OL]. [2014-07-01]. http://www.nlpcn.org/resource/list/4.
  • 5GATTANI A, LAMBA D S, GARERA N, et al. Entity extraction, linking, classification, and tagging for social media: a Wikipedia-based approach [J]. Proceedings of the VLDB endowment, 2013, 6(11): 1126-1137.
  • 6WANG H, QI Z, HAO H, et al. A hybrid method for Chinese entity relation extraction [M]// Natural Language Processing and Chinese Computing. Berlin: Springer, 2014: 357-367.
  • 7WANG C, FAN J. Medical relation extraction with manifold models [C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2014: 828-838.
  • 8徐健,张智雄,吴振新.实体关系抽取的技术方法综述[J].现代图书情报技术,2008(8):18-23. 被引量:54
  • 9李昊旻,李莹,段会龙,吕旭东.中文病历文档术语提取和否定检出方法[J].中国生物医学工程学报,2008,27(5):716-721. 被引量:9
  • 10齐玉东,闫晓斌,谢晓方.基于LISA理论的概念模型相似度计算[J].计算机工程与应用,2012,48(3):40-42. 被引量:2

引证文献4

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部