期刊文献+

基于二维关联边条件随机场的Web信息抽取

Web Information Extraction Based on 2D Correlative-chain Conditional Random Fields
下载PDF
导出
摘要 针对Web信息抽取领域中存在的"项无序"问题,提出了一种基于二维关联边条件随机场模型的Web信息抽取方法。将Web文档解析为一个词性序列,映射待抽取的信息项的状态,映射待抽取的信息项为二维关联边条件随机场中的序列参数,使用归纳算法构造二维关联边条件随机场模型。实验结果证明该方法可以获得更好的抽取性能。 To solve disorder among information items in the field of Web information extraction, this paper proposes a Web information extraction algorithm based on 2D correlative-chain conditional random fields. It parses a Web document into a part of speech sequence, and maps an information item to a state with mapping information items to be extracted for the two-dimensional Correlative-Chain Conditional Random Fields (2D-CRFs). A 2D- CRFs model is obtained by using induction algorithm. Experiments show that the algorithm has better extraction performance.
作者 邓箴
出处 《价值工程》 2010年第34期186-186,共1页 Value Engineering
关键词 条件随机场 WEB信息抽取 归纳算法 conditional random fields web information extraction induction algorithm
  • 相关文献

参考文献3

  • 1WIEDRHOLD G.Mediators in the Architecture of Future Information System. IEEE Computer, 1992.25(3):38-49.
  • 2Hammar J, Garcia-Molina H, Cho j ,et al.Extraction semi-structured information from the Web[C].In Proceedings of the Workshop on Management of Semi-structured Data,Tucson Arizona, 1997.
  • 3李效东,顾毓清.基于DOM的Web信息提取[J].计算机学报,2002,25(5):526-533. 被引量:101

二级参考文献17

  • 1Florescu D, Levy A Y, Mendelzon A. Database techniques for the World-Wide Web: A Survery. In: ACM The SIGMOD Record, 1998.59-74
  • 2Atzeni P, Mecca G, Merialdo P. To weave the Web. In: Proc the 23rd International Conference on Very Large Data Bases. Athens, Greece, 1997. 206-215
  • 3Pemberton S et al. XHTML 1.0: The extensible hyperText markup language. In: http://www.w3.org/MarkUp/
  • 4Cattell R G G. The Object Database Standard ODMG-93. San Mateo,California: Morgan Kaufmann Publishers,1994
  • 5Mitchell T. Machine Learning. New York: McGraw Hill, 1997
  • 6Wall L et al. Programming Perl(3rd Edition). O'Reilly & Associates,2000
  • 7Birbeck M et al. Professional XML. Wrox Press Inc, 2000
  • 8Liu L, Pu C, Han W. XWRAP: An XML-enabled wrapper construction system for web information sources. In: Proc International Conference on Data Engineering (ICDE), San diego, California, 2000. 611-621
  • 9Chamberlin D, Robie J, Florescu D. Quilt: An XML query language for heterogeneous data sources. In: Proc International Workshop on the Web and Databases (WebDB'2000), Dallas, Texas, 2000. 53-62
  • 10Sahuguet A, Azavant F. Building light-weight wrappers for legacy web datasources using w4f. In: Proc International Conference on Very Large Databases, Edinburgh, Scotland, 1999. 738-741

共引文献100

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部