摘要
随着信息技术、数据库技术、网络技术的发展,各行各业均存储了大量的文本数据,怎样从这些文本数据中发掘有价值的信息和知识成为人们急需解决的问题。提出基于Maximum Likelihood与HMM的文本挖掘方法,利用Maximum Likelihood构建隐马尔可夫模型,对论文条目进行特定信息的发掘,并克服了实验过程中"零概率"的缺陷。实验结果表明准确率平均达到0.9,召回率平均达到0.85,从理论和实践上证明该方法是有效的。
With the development of information technology, database technology and network technology, a large number of texts are produced in all kinds of fields, the question should be solved quickly that how to mine useful information and knowledge from texts. Introduces how to mine information using maximum likelihood and hidden Markov model. It constructs HMM with maximum likelihood and mines customizing messages from thesis entries with HMM. During the process of extracting, it deals with the questing of"zero probability". The experiment results indicate that the average precise rate arrives to 0.9 and the average recall rate arrives to 0.85. Both in theory and in practice the method are effective.
出处
《计算机技术与发展》
2007年第12期110-112,共3页
Computer Technology and Development
基金
湖南省自然科学基金资助项目(04JJ40051)
湖南省教育厅资助项目(06C724)
关键词
隐马尔可夫模型
最大似然
文本挖掘
信息抽取
hidden Markov model
maximum likelihood
text mining
information extraction