期刊文献+

基于隐条件随机场的异构Web数据源数据抽取算法研究

Research on Heterogeneous Web Data Extraction Algorithm Based on Hidden Conditional Random Fields
下载PDF
导出
摘要 提出了一种基于改进的隐条件随机场的异构Web数据源数据抽取算法。通过对隐条件随机场进行的改进,对隐含变量进行更为准确的计算,并且克服了该模型的性能严重依赖于初始参数选择的问题,而且进行模型训练时不需要大量的人工标注的样本数据。实验结果表明,对比已有方法,本文算法在对具有缺省属性以及多属性特征的网站进行数据抽取时,在查全率,查准率以及F1值上都获得了令人满意的性能。 In this paper,we propose a novel heterogeneous Web data extraction algorithm based on modified hidden conditional random fields model.Firstly,the hidden conditional random fields model is modified to obtain more accurate calculation of implicit variables,and the problem that the model's performance is heavily dependent on the choice of initial parameters is well solved.Moreover,the proposed model does not require a lot of manual labeling sample data to construct training data.Experimental results show that compared with the existing method,the proposed algorithm can obtain satisfactory performance both in websites with the default attributes and the websites with multi-attributes.
作者 於实
出处 《科技通报》 北大核心 2012年第8期168-170,共3页 Bulletin of Science and Technology
关键词 条件随机场 隐条件随机场 WEB数据抽取 判别式模型 conditional random fields hidden conditional random fields Web data extraction discriminative model
  • 相关文献

参考文献6

  • 1Sminchisescu C, Kanaujia A, et al. Conditional models for contextual human motion recognition. ICCV 2005:1808- 1815.
  • 2Torralba A, Murphy K, Freeman W. Contextual models for object detection using boosted random fields. [C]//: Advances in Neural Information Processing Systems. 2004.
  • 3Wang L, Suter D. Recognizing human activities from silhouettes: Motion subspace and factorial discriminative graphical model[C]//. CVPR 2007 : 1-8.
  • 4Quattoni A, Wang S, et al. Hidden conditional random fields[J]. IEEE Trans. on PAMI, 2007, 29(10):1848-1852.
  • 5Lafferty J, McCallum A, Pereira F. Conditional Random Fields: Probabmstic Models for Segmenting and Labeling Sequence Data[C]//. ICML2001 : 282-289.
  • 6Crescenzi, V. and Mecca, G. and Merialdo, P. Roadrunner: Towards automatic data extraction from large web sites[C]//. VLDB 2001:109-118.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部