期刊文献+

基于XML的信息抽取和多层向量空间技术研究 被引量:4

Research on Information Extraction and Multilayer Vector Space Based on XML Technology
下载PDF
导出
摘要 在分析了传统索引技术的缺陷的基础上,提出了一种基于XML架构信息抽取的多层向量空间的模型,重点阐述了如何建立一个基于XML的Web信息抽取平台,从构造知识库、网页优化和信息抽取这三方面进行研究,并阐述了基于XML的多层向量空间模型及其形成。通过这种技术不仅使网页内容得到了清晰的剖析和抽取,更能大大提高对Web网页文档检索效率和准确率。目的在于寻求一种更为高效简洁的检索方法。 By analyzing the limitations of traditional searching technolcgy, proposes a platform of Web information extraction based on XML and a model of searching technology based on XML construction and multilayer vector space, both of which greatly improve the efficiency of indexing documents. This model not only anatomizes and extracts the documents clearly, but also promotes the efficiency of Web documents searching. The main purpose of it is to find a more efficient and compact searching method.
作者 仲华 崔志明
出处 《计算机技术与发展》 2007年第7期49-52,共4页 Computer Technology and Development
基金 江苏省高技术研究项目(BG2005019) 教育部"高校博士点科研基金项目"(20040285016) 教育部科研重点项目(205059)
关键词 XML架构 信息抽取 N层向量空间模型 XML information extraction N layer vector space model
  • 相关文献

参考文献6

二级参考文献28

  • 1张清军,朱才连.基于主动学习的Web页面信息抽取[J].情报学报,2004,23(6):667-671. 被引量:5
  • 2刘其云,李中言.信息抽取的功能和实现方法[J].情报杂志,2005,24(5):67-68. 被引量:4
  • 3Hammer J, Garcia-Molina H, Cho J, et al. Extracting Semistructured Information from the Web. Proceedings of file First Workshop on Management of Semistructured Data, 1997-05.
  • 4Sahuguet A, Azavant F. Building Light-weight Wrappers for Legacy Web Data-sources Using W4F. International Conference on Very Large Databases (VLDB), 1999.
  • 5Soderland S. Learning Information Extraction Rules for Semistructured and FreeText. Machine Learning, 1999.
  • 6Kushmerick N, Weld D, Doorenbos B. Wrapper Induction for Information Extraction. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97), 1997.
  • 7Muslea I, Minton S, Knoblock C. STALKER: Learning Extraction Rules for Semistructured, Web-based Information Sources. AAAI-98 Workshop on "AI & Information Integration", 1998.
  • 8Muslea I. Extraction Patterns: From Information Exlraction to Wrapper Induction. Technical Report, Information Sciences Institute,University of Southern Califomi, 1998.
  • 9Doorenbos R B, Etzioni O, Weld D W. A Scalable Comparison-shopping Agent for the World Wide Web. In Proceedings of the First International Conference on Autonomous Agents, 1997-02.
  • 10Gao X, Sterling L AutoWrapper: Automatic Wrapper Generation for Multiple Online Services. In Proceedings of Asia Pacific Web Conference 1999 (AP- Web99), 1999.

共引文献48

同被引文献23

  • 1王茹,宋瀚涛,陆玉昌.网页数据自动抽取系统[J].计算机工程与应用,2004,40(19):135-138. 被引量:8
  • 2梁晓涛,谢荣传.基于OWL描述本体的语义信息抽取[J].计算机技术与发展,2006,16(1):62-65. 被引量:2
  • 3李石君,于俊清,欧伟杰.基于HTML模式代数的Web信息提取方法[J].计算机研究与发展,2006,43(9):1644-1650. 被引量:8
  • 4Laender A H F, Ribeiro- Neto B A, Da Silva A S, et al.A Brief Survey of Web Data Extraction Tools [ J ]. SIGMOD Record,2002,31 (2) :84 - 93.
  • 5Wessrnan A, Liddle S W, Embley D W. A generalized framework for an ontology- based data- extraction system[C]// The 4th International Conference on Information Systems Technology and its Applications. Palmerston North, New Zealand. [s. n. ] ,2005:239 - 253.
  • 6Burget R. Layout Based Information Extraction from HTML Documents[ C ]/The Ninth International Conference on Document Analysis and Recognition. [ s. l. ] : [s. n. ],2007.
  • 7Li Yu, Meng Xiaofeng, Li Qing, et al. Hybrid Method for Automated News Content Extraction from the Web[ C ]//Web Information Systems Engineering ( WISE2006 ). Wuhan: [ s. n.], 2006.
  • 8Gupta S, Kaiser G, Neistadt D, et al. DOM-based Content Extraction of HTML Documents [ C ]//The 12th International Conference on World Wide Web. [ s. l. ] : [ s. n. ], 2003.
  • 9Geng Hua ,Gao Qiang,pan Jingui. Extracting Content for News Web Pages Based on DOM[J]. International Journal of Computer Science and Network Security, 2007, 7 (2) : 124-129.
  • 10Lin Shian-hua, Ho Jan-ming. Discovering informative content blocks from Web documents[ C]//ACM SIGKDD Inter- national Conference on Knowledge Discovery & Data Mining. [s.l. ]: [s.n. l, 2002.

引证文献4

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部