摘要
传统Web信息抽取的隐马尔可夫模型对初值十分敏感和在实际训练中极易得到局部最优模型参数。提出了一种使用遗传算法优化HMM模型参数的Web信息抽取混合算法。该算法使用实数矩阵编码表示染色体,似然概率值为适应度取值,将GA与Baum-Welch算法相结合对HMM模型参数进行全局优化,并且调整GA-HMM的Baum-Welch算法参数实现Web信息抽取。实验结果表明,新的算法在精确度和召回率指标上比传统HMM具有更好的性能。
The traditional training method of HMM for Web information extraction is sensitive to the initial model parameters and easy to lead to a sub-optimal model in practice.A hybrid algorithm is proposed to optimize HMM parameters by using genetic algorithm for Web information extraction,The algorithm makes use real number matrix encoding as the representation of the chromosomes,the fitness values are the results of the likelihood values,combines GA and Baum-Welch algorithm to optimize HMM parameters globally,and then to adjust the Baum-Welch algorithm parameters in GA-HMM for Web information extraction,Experimental results show that the new algorithm improves the performance in precision and recall.
出处
《计算机工程与应用》
CSCD
北大核心
2008年第18期132-135,共4页
Computer Engineering and Applications
基金
湖南省自然科学基金(the Natural Science Foundation of Hunan Province of China under Grant No.04JJ40051)
湖南省教育厅资助科研课题(the Research Project of Department of Education of Hunan Province
China under Grant No.06c724)