摘要
针对传统隐马尔科夫模型对初值敏感和未考虑历史状态的问题,提出了使用模拟退火算法训练二阶隐马尔科夫模型参数的SA-HMM2。在基于SA-HMM2的Web信息抽取方法中,采用基于视觉的网页分割算法VIPS对网页分块得到状态转移序列,利用提出的SA-HMM2训练算法获取HMM2全局最优模型参数,用改进的Viterbi算法实现了Web信息的抽取。实验结果表明,该方法在平均综合值方面比HMM、GA-HMM分别提高约21%和7%。
The traditional hidden markov model (HMM) is sensitive to the initial model parameters and does not consider the is- sue of historical state. The SA-HMM2 algorithm is proposed, which utilizes the simulated annealing (SA) algorithm to train the parameters of the secon&order HMM (HMM2). In the Web information extraction method based on SA-HMM2, a state transi- tion sequence is obtained by using the vision-based page segmentation algorithm (V1PS). The proposed SA-HMM2 training algo- rithm is used for calculating the global optimal parameters of HMM2. The improved Viterbi algorithm is implemented for Web information extraction. Compared with HMM and GA-HMM, experimental results show that the new strategy increases the ex- traction performance by 21% and 7% respectively.
出处
《计算机工程与设计》
CSCD
北大核心
2014年第4期1264-1268,共5页
Computer Engineering and Design
基金
"十二五"国家科技支撑计划基金项目(2011BAD21B05
2013BAD15B02)
中央高校基本科研业务费基金项目(QN2011036)