期刊文献+

隐马尔可夫模型解决信息抽取问题的仿真研究 被引量:5

Application Research of Hidden Markov Model on Web Information Extraction
下载PDF
导出
摘要 研究Web文档服务的准确性和快速性,网络信息抽取成为处理海量网络信息的重要手段,而大量异构信息的有效抽取是非常困难的,为了改进和提高系统对于海量异构网页信息的抽取查全率和查准率,提出了一种新的信息抽取的方法,算法利用了隐马尔可夫模型在处理规则知识上的优势对每个页面构建HTML树,并利用Shannon熵来定位数据域,再用Maxi-mum Likelihood方法实现隐马尔可夫模型的构建,实现对Web信息的抽取。仿真结果表明,通过对大量学术论文头部结构信息的抽取,应用算法可以使信息抽取在召回率和准确率方面有明显的提高。 As much information appears on the Internet, Web information extraction became an important means of massive network information processing. It is difficult to effectively extract the Web information. In order to improve and enhance the recall rate and precision rate of massive heterogeneousWeb information, this paper proposes an algorithm based on Hidden Markov Model (HMM) for Web information extraction. The algorithm is applied to pro- cessing rule knowledge for pages to create HTML Tree. And then Shannon entropy is used to locate date fields. Next, the algorithm constructs HMM by Maximum Likelihood. The experimental results show that by processing and applying mass structural information of Web papers with HMM, this method has good performance in Recall and Precision.
出处 《计算机仿真》 CSCD 北大核心 2010年第5期132-135,共4页 Computer Simulation
基金 陕西省自然科学基金资助项目(2007F25) 西安财经学院科研基金资助项目(07XCK04) 陕西省教育厅专项科研计划项目(09JK440)
关键词 隐马尔可夫模型 信息抽取 极大似然 机器学习 Hidden markov model Web information extraction Maximum likelihood Machine learning
  • 相关文献

参考文献5

  • 1E Riloff, R Jones. Learning dictionaries for information extraction by multi - level bootstrapping [ C ]. Proceedings of the Sixteenth National Conference on Artilicial Intelligence, Orlando: AAAI Press, 1999. 811 - 816.
  • 2N Kushmerick. Wrapper induction:Efficiency and expressiveness [ J ]. Artificial Intelligence, 2000,118 ( 12 ) : 15 - 68.
  • 3Kristie Seymore, Andrew McCallum, Ronal Rosenfel. Learning hidden Markov model structure for information extraction [ C ]. Proceedings of the AAAI' 99 Workshop on Machine Learning for Information Extraction, Orlando: AAAI Press, 1999.37 - 42.
  • 4Dayne Frietag, Andrew McCallum. Information extraction with HMMs and shrinkage [ C ]. Proceedings of the AAAI' 99 Workshop on Machine Learning for Information Extraction, Orlando: AAAI Press, 1999.31 - 36.
  • 5Lawrence E Rabiner. A tutorial on hidden Markov models and selected application in speech recognition [ C ]. Proceedings of the IEEE, 1989,77(2) :257 -286.

同被引文献30

引证文献5

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部