期刊文献+

基于改进遗传退火HMM的Web信息抽取研究 被引量:3

RESEARCH ON Web INFORMATION EXTRACTION BASED ON IMPROVED GENETIC ANNEALING AND HMM
下载PDF
导出
摘要 为进一步提高Web信息抽取的准确率,针对隐马尔可夫模型HMM(Hidden Markov Model)及混合法在参数寻优上的不足,提出一种改进遗传退火HMM的Web抽取算法。构建一个后向依赖假设的HMM;用改进遗传退火优化HMM参数,将遗传算子和模拟退火SA(simulated annealing)参数改进后,据GA(genetic algorithm)的自适应交叉、变异概率给子群体分类,实现多种群并行搜索和信息交换,以避免早熟,加速收敛;并将SA作为GA算子,加强局部寻优能力;最后,用双序Viterbi解码,与现有HMM优化法相比,实验的综合Fβ=1平均提高了6%,表明改进算法能有效提高抽取准确率和寻优性能。 In order to further raise the accuracy of Web information extraction,for the shortcomings of hidden Markov model( HMM) and its hybrid method in the parameter optimisation,we present a Web extraction algorithm which is based on the improved genetic annealing and HMM. First,the algorithm sets up a novel HMM with backward dependency assumption; secondly,it applies the improved genetic annealing algorithm to optimise HMM parameters. After the genetic operators and parameters of simulated annealing( SA) have been improved,the subpopulations are classified according to the adaptive crossover and mutation probability of GA in order to realise the multi-group parallel search and information exchange,which can avoid premature and accelerate convergence. Then SA is taken for a GA operator to strengthen the local searching capability. Finally,the bi-order Viterbi algorithm is used for decoding. Compared with existing HMM optimisation method,the comprehensive Fβ = 1value in experiment increases by 6% in average,which shows that the improved algorithm can effectively raise the extraction accuracy and search performance.
出处 《计算机应用与软件》 CSCD 北大核心 2014年第4期40-44,共5页 Computer Applications and Software
基金 国家自然科学基金项目(601003247) 山西省高校科技开发项目(20101120 2013147) 忻州师院重点学科建设项目(ZDXK201204)
关键词 信息抽取 遗传退火 隐马尔可夫模型 VITERBI算法 Information extraction Genetic annealing Hidden Markov model Viterbi algorithm
  • 相关文献

参考文献10

二级参考文献99

共引文献194

同被引文献13

引证文献3

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部