期刊文献+

半结构化文档集的结构模式提取的研究与实现 被引量:5

Research and Implementation of Structure Extraction of Semi-structured Document
下载PDF
导出
摘要 提出了通过结构模式提取,在信息源对缺失信息进行恢复与重构的思想,给出了结构模式提取模型,并讨论了实现该模型的关键步骤与算法,最后结合基于该模型实现的系统对半结构化文档集的结构模式提取及其应用进行了总结。该研究成果已成功应用于实际系统中。 A model of structure extraction was brought forward in the paper. First, an idea was given that the semantic structure information been extracted at information source through the rules of the relation between semantic structure information and style information. Then, the paper puts forward a model how to extract structure of semi-structured document. The key step and key algorithm were discussed in detail. Last, the extraction method and its application were summarized with an system, which had been constructed based on the scheme. The idea and the method had been used in an applied system with success.
出处 《计算机工程》 CAS CSCD 北大核心 2001年第10期19-21,113,共4页 Computer Engineering
基金 国家重点工程中国百科术语数据库工程新闻出版署--(上报国家计委的重点工程)
关键词 半结构化文档集 结构模式提取 WEB INTERNET Structure extraction Semi-structure XML Markup language Web publishing
  • 相关文献

参考文献4

二级参考文献2

  • 1Ham mar J,SIGMOD Record,1997年,26卷,2期,18页
  • 2孟志华,北大方正书版排版技术与应用,1993年,12页

共引文献48

同被引文献36

  • 1胡师彦.XML原理与应用[J].哈尔滨商业大学学报(自然科学版),2001,17(4):55-57. 被引量:4
  • 2李石君,于俊清,欧伟杰.基于HTML模式代数的Web信息提取方法[J].计算机研究与发展,2006,43(9):1644-1650. 被引量:8
  • 3仲华,崔志明.基于XML的信息抽取和多层向量空间技术研究[J].计算机技术与发展,2007,17(7):49-52. 被引量:4
  • 4Burget R. Layout Based Information Extraction from HTML Documents[ C ]/The Ninth International Conference on Document Analysis and Recognition. [ s. l. ] : [s. n. ],2007.
  • 5Li Yu, Meng Xiaofeng, Li Qing, et al. Hybrid Method for Automated News Content Extraction from the Web[ C ]//Web Information Systems Engineering ( WISE2006 ). Wuhan: [ s. n.], 2006.
  • 6Gupta S, Kaiser G, Neistadt D, et al. DOM-based Content Extraction of HTML Documents [ C ]//The 12th International Conference on World Wide Web. [ s. l. ] : [ s. n. ], 2003.
  • 7Geng Hua ,Gao Qiang,pan Jingui. Extracting Content for News Web Pages Based on DOM[J]. International Journal of Computer Science and Network Security, 2007, 7 (2) : 124-129.
  • 8Lin Shian-hua, Ho Jan-ming. Discovering informative content blocks from Web documents[ C]//ACM SIGKDD Inter- national Conference on Knowledge Discovery & Data Mining. [s.l. ]: [s.n. l, 2002.
  • 9Chen Enhong. Semistructured Data Extraction and Schema Knowledge Mining. Accepted by Euromicro Workshop on Multimedia and Telecommunications, Italy, 1999.
  • 10W3C. Extensible Markup Language (XML) 1.0[EB/OL].http://www. w3. org/TR/1998/REC- xml - 19980210.2000 - 10- 06.

引证文献5

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部