混合遗传算法和隐马尔可夫模型的Web信息抽取被引量：4

Hybrid genetic algorithm and hidden Markov model for Web information extraction

下载PDF

导出

摘要传统Web信息抽取的隐马尔可夫模型对初值十分敏感和在实际训练中极易得到局部最优模型参数。提出了一种使用遗传算法优化HMM模型参数的Web信息抽取混合算法。该算法使用实数矩阵编码表示染色体,似然概率值为适应度取值,将GA与Baum-Welch算法相结合对HMM模型参数进行全局优化,并且调整GA-HMM的Baum-Welch算法参数实现Web信息抽取。实验结果表明,新的算法在精确度和召回率指标上比传统HMM具有更好的性能。 The traditional training method of HMM for Web information extraction is sensitive to the initial model parameters and easy to lead to a sub-optimal model in practice.A hybrid algorithm is proposed to optimize HMM parameters by using genetic algorithm for Web information extraction,The algorithm makes use real number matrix encoding as the representation of the chromosomes,the fitness values are the results of the likelihood values,combines GA and Baum-Welch algorithm to optimize HMM parameters globally,and then to adjust the Baum-Welch algorithm parameters in GA-HMM for Web information extraction,Experimental results show that the new algorithm improves the performance in precision and recall.

作者肖基毅邹腊梅李传琦

机构地区南华大学计算机科学与技术学院

出处《计算机工程与应用》 CSCD 北大核心 2008年第18期132-135,共4页 Computer Engineering and Applications

基金湖南省自然科学基金(the Natural Science Foundation of Hunan Province of China under Grant No.04JJ40051) 湖南省教育厅资助科研课题(the Research Project of Department of Education of Hunan Province China under Grant No.06c724)

关键词遗传算法隐马尔可夫模型 WEB信息抽取 Baum—Welch算法最大似然算法 genetic algorithm hidden Markov model Web information extraction Baum-Welch algorithm maximum likelihood algorithm

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献13

1Freitag D,McCallurn A.lnforrnation extraction with HMMs and shrinkage[C]//Proceedings of the AAAI'99 Workshop on Maehine Learning for Information Extraction.Orlando,Florida: AAAi Press/ MIT Press, 1999:31-36.
2Freitag D,McCallum A.Information extraction with HMM structures learned by stochastic optimization[C]//Proceedings of the Eighteenth Conference on Artificial Intelligence.Austin,Texas: AAAI Press, 2000 : 584-589.
3Seymore K,McCallurn A,Rosenfeld R.Learning hidden Markov model structure for information extraction[C]//AAAI'99 Workshop on Machine Learning for Information Extraction.Orlando,Florida: AAAi Press/MIT Press, 1999:37-42.
4Freitag D,McCallum A,Pereira F.Maximum entropy Markov models for information extraction and seqmentation[C]//Proceedings of ICML- 2000.CA, USA : Morgan Kanfmann, 2000 : 591-598.
5刘云中,林亚平,陈治平.基于隐马尔可夫模型的文本信息抽取[J].系统仿真学报,2004,16(3):507-510. 被引量：51
6Bouchaffra D,Tan J.Structural hidden Markov models using a relation of equivalence: application to automotive designs[J].Data Mining and Knowledge Discovery,2006, 12:79-96.
7Mooney R J,Nahrn U Y.Text mining with information extraction[C]// Daelernans W,du Plessis T,Suyrnan C,et al.Proceedings of the 4th International MIDP Colloquium Multilingualisrn and Electronic Language Managernent:Bloern-foutein,South Africa,September 2003. South Africa:Van Schaik Pub,2005: 141-160.
8Phan X H,Horiguchi S,Ho T B.Autornated data extraction from the Web with conditional rnodels[J].Int J Business Intelligence and Data Mining,2005, 1(2) : 191-209.
9Kwong S,Chan C W,Man K F,et al.Optirnization of HMM topology and its model parameters by genetic algorithms [J].Pattern Recognition, 2001,34: 509-522.
10Hong Q Y,Kwong S.A genetic classification method for speaker reeognition[J].Engineering Applications of Artificial Intelligence, 2005,18: 13-19.

二级参考文献13

1[1]A. McCallum, K. Nigam, J. Rennie, and K. Seymore. A machine learning approach to building Domain-Specific Search Engines [A]. In Proceedings of IJCAI-99 [C]. 622-667.
2[2]Ellien Riloff. Automatically Constructing a Dictionary for Information Extraction Task [A]. Proceeding for the Eleventh National Conference on Artificial Intelligence [C]. 1993. 811-816.
3[3]E. Riloff , R. Jones. Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping [A]. Proceedings of the Sixteenth National Conference on Artificial Intelligence [C]. 1999. 811-816.
4[4]S. Soderland. Learning information extraction rules for semi-structured and free text [J]. Machine Learning, 1999, 1-44.
5[5]Kushmerick, N. Wrapper induction: efficiency and Expressiveness [J]. Artificial Intelligence,2000, Vol. 118, pp. 15--68.
6[6]Leek,T. R. Information Extraction Using Hidden Markov Models [D]. Master's thesis, UC san Diego,1997.
7[7]Kristie Seymore, Andrew McCallum, Ronal Rosenfel. Learning Hidden Markov Model Structure for Information Extract [A]. AAAI' 99 Workshop on Machine Learning for Information Extraction [C]. 1999. 37-42.
8[8]Dayne Frietag, Andrew McCallum. Information Extraction with HMMs and shrinkage [A]. In Proceedings of the AAAI'99 Workshop on Machine Learning for Information Extraction [C], 1999, pp. 31-36.
9[9]Freitag, D., & McCallum, A. Information extraction with HMM structures learned by stochastic optimization [A]. Proceedings of the Eighteenth Conference on Artificial Intelligence [C]. 2000.584-589.
10[10]Freitag, D., McCallum, A., and Pereira F. Maximum Entropy Markov Models for Information Extraction and Segmentation [A]. In proceedings of ICML-2000 [C]. 591-598.

共引文献50

1王敬普,林亚平,周顺先,岳文.基于包装器模型的文本信息抽取[J].计算机应用,2006,26(3):655-658. 被引量：8
2王雷,陈治平,李志成.基于文本分块的多模板隐马尔可夫模型的文本信息抽取[J].山东大学学报（理学版）,2006,41(3):25-28. 被引量：4
3顾铮,顾平.信息抽取技术在中医研究中的应用[J].医学信息（西安上半月）,2007,20(1):27-30. 被引量：11
4聂哲,顾明.基于XML的政府公文信息抽取中间件的设计与实现[J].计算机工程与设计,2007,28(5):1158-1160.
5郑彦宁,化柏林,张新民.信息检索与信息抽取差异性探析[J].图书情报工作,2007,51(10):17-20. 被引量：1
6于江德,樊孝忠,尹继豪,顾益军.基于隐马尔可夫模型的中文科研论文信息抽取[J].计算机工程,2007,33(19):190-192. 被引量：9
7周顺先,林亚平,王耀南,易叶青.基于聚簇隐马尔可夫模型的文本信息抽取[J].系统仿真学报,2007,19(21):4926-4931. 被引量：2
8于江德,樊孝忠,尹继豪.基于条件随机场的中文科研论文信息抽取[J].华南理工大学学报（自然科学版）,2007,35(9):90-94. 被引量：11
9于江德,樊孝忠,尹继豪.隐马尔可夫模型在自然语言处理中的应用[J].计算机工程与设计,2007,28(22):5514-5516. 被引量：14
10王静,姚勇,刘志镜.基于广义隐马尔可夫模型的网页信息抽取方法[J].山东大学学报（理学版）,2007,42(11):49-52. 被引量：3

同被引文献34

1林亚平,刘云中,周顺先,陈治平,蔡立军.基于最大熵的隐马尔可夫模型文本信息抽取[J].电子学报,2005,33(2):236-240. 被引量：48
2周俊生,戴新宇,尹存燕,陈家骏.基于层叠条件随机场模型的中文机构名自动识别[J].电子学报,2006,34(5):804-809. 被引量：112
3王雷,陈治平,李志成.基于文本分块的多模板隐马尔可夫模型的文本信息抽取[J].山东大学学报（理学版）,2006,41(3):25-28. 被引量：4
4吴芬芬,刘磊,肖宪.一种启发式的信息抽取算法[J].吉林大学学报（理学版）,2007,45(1):73-76. 被引量：3
5周顺先,林亚平,王耀南.基于主动学习隐马尔可夫模型的文本信息抽取[J].湖南大学学报（自然科学版）,2007,34(6):74-77. 被引量：3
6RABINER L R. A tutorial on hidden Markov models and selected applications in speech recognition [ J]. Proceedings of the IEEE, 1989, 77(2) : 257 - 286.
7FRASCONI P, SODA G, VULLO A. Hidden Markov models for text categorization in multi-page documents [ J]. Journal of Intelligent Information Systems, 2002, 18(2) : 195 -217.
8Skounakis M, Craven M, Ray S. Hierarchical hidden markov models for information extraction[C]//Proceedings of the 18th International Joint Conference on Artificial Intelligence Acaptr lco, Mexico: Morgan Kaufmann, 2003 : 427-433.
9Freitag D, McCallum A, Pereira F. Maximum Entropy Markov models for information extraction and segaTlentation[C]//Proceedings of the Seven teenth International Conference on Machine Learning. San Francisco: Morgan Kaufmann,2000:591-598.
10Bundschus M, DejoriI M, Stetter M, et al. Extraction of semantic biomedical relations from text using conditional random fields [J]. BioMed Central(BMC)Bioinformaties, 2008,9 : 207-220.

引证文献4

1杨健,汪海航.基于隐马尔可夫模型的文本分类算法[J].计算机应用,2010,30(9):2348-2350. 被引量：8
2李荣,胡志军,郑家恒.基于遗传算法和隐马尔可夫模型的Web信息抽取的改进[J].计算机科学,2012,39(3):196-199. 被引量：8
3李伟男,李书琴,景旭,魏露,李新乐.基于模拟退火算法和二阶HMM的Web信息抽取[J].计算机工程与设计,2014,35(4):1264-1268. 被引量：7
4李荣,冯丽萍,王鸿斌.基于改进遗传退火HMM的Web信息抽取研究[J].计算机应用与软件,2014,31(4):40-44. 被引量：3

二级引证文献26

1张春元.基于条件随机场的文本分类模型[J].计算机技术与发展,2011,21(7):77-80. 被引量：5
2李开荣,孔照昆,陈桂香,朱俊武.基于改进隐马尔可夫模型的文本分类研究[J].微电子学与计算机,2012,29(11):161-165. 被引量：3
3刘晓飞,邸书灵.基于隐马尔科夫模型的文本分类[J].石家庄铁道大学学报（自然科学版）,2013,26(1):101-105. 被引量：1
4李伟男,李书琴,景旭,魏露,李新乐.基于模拟退火算法和二阶HMM的Web信息抽取[J].计算机工程与设计,2014,35(4):1264-1268. 被引量：7
5王吉发,郭楠,蒋亚朋.企业转型因子的识别方法研究[J].华东经济管理,2014,28(7):121-125. 被引量：13
6刘志强,杨培培,倪捷,冯新颖.面向模拟驾驶训练的驾驶意图识别方法[J].重庆理工大学学报（自然科学）,2014,28(10):1-7. 被引量：2
7王宁,李石君.基于模拟退火算法和隐马尔可夫模型的文本信息抽取[J].微电子学与计算机,2014,31(12):52-56.
8陈免慧,沈炜.嵌入式语音系统信息采集算法研究[J].工业控制计算机,2015,28(12):64-65. 被引量：2
9王克永,刘纪平,罗安,王勇.前后缀与特征词相结合的地名地址提取[J].测绘通报,2016(2):64-68. 被引量：17
10兰秋军,李卫康,刘文星.不同情境下中文文本分类模型的表现及选择[J].湖南大学学报（自然科学版）,2016,43(4):141-146. 被引量：4

1韩梅.一种静态图像的超分辨率技术[J].计算机应用,2007,27(B06):164-165.
2赵黎,张燕,石介沛,祝捷.基于最大似然算法的OFDM频偏估计研究[J].电子设计工程,2011,19(13):96-98. 被引量：1
3梁华刚,程加乐,孙小喃.基于最大似然法的超分辨率合成的QR条码识别方法[J].计算机与数字工程,2015,43(7):1320-1324. 被引量：1
4张红民,成于思,梁琛颖.基于广义高斯分布的最大后验概率图像复原算法[J].重庆理工大学学报（自然科学）,2011,25(5):66-69. 被引量：2
5张燎,金佛荣,金文进.一种改进的最大似然法在直流电机参数辨识中的应用[J].工业仪表与自动化装置,2014(2):33-36. 被引量：2
6辜方林,张杭,朱德生.Maximum Likelihood Blind Separation of Convolutively Mixed Discrete Sources[J].China Communications,2013,10(6):60-67.
7王川,段德全,王晓东.基于改进的PSO和HMM的Web信息抽取算法[J].河南师范大学学报（自然科学版）,2010,38(5):65-68. 被引量：3
8汤守领,吴长奇,王熹.OFDM系统中对最大似然算法频偏估计的改进[J].无线电通信技术,2010,36(5):62-64. 被引量：2
9陈斌,胡丽,李昂,陈祥健,刘好武.基于BP神经网络算法的仿真研究[J].信息技术,2016,40(8):85-88. 被引量：10
10何骞,陈再师.一种基于形态滤波器的PET图像重建算法[J].科技视界,2013(29):24-24.

计算机工程与应用

2008年第18期

浏览历史

内容加载中请稍等...

混合遗传算法和隐马尔可夫模型的Web信息抽取被引量：4

参考文献13

二级参考文献13

共引文献50

同被引文献34

引证文献4

二级引证文献26

相关作者

相关机构

相关主题

浏览历史

混合遗传算法和隐马尔可夫模型的Web信息抽取 被引量：4

参考文献13

二级参考文献13

共引文献50

同被引文献34

引证文献4

二级引证文献26

相关作者

相关机构

相关主题

浏览历史

混合遗传算法和隐马尔可夫模型的Web信息抽取被引量：4