摘要
针对传统文本关键信息抽取模型中存在的信息抽取效果不佳、抽取耗时较长等问题,提出设计知识数据库中非结构化文本关键信息抽取模型。利用六元组优化隐马尔可夫模型,取得模型发生概率,平滑处理不完整的训练样本;对不同时刻释放观察值序列展开初始化、终结操作,获取最优状态序列,经过解码观察序列后,对比得到正序解码序列与逆序解码序列,滤除无解码歧义的状态,完成歧义消除;根据解得的最大概率状态序列,明确所要抽取的文本关键信息,完成知识数据库中非结构化文本关键信息抽取模型设计。实验结果表明:采用所提模型抽取非结构化文本关键信息的效果较好,且耗时较短。
The traditional text key information extraction model has a poor extraction effect and long extraction time.In this regard, the key information extraction model of unstructured text in a knowledge database is designed.Firstly, for obtaining the occurrence probability of the model and smooth the incomplete training samples, six tuples were applied to optimize the hidden Markov model.Secondly, the observation sequences released at different times were initialized and stopped to obtain the optimal state sequence.Then, the observation sequence was decoded to get the positive sequence and the negative sequence, thus eliminating the state of no decoding ambiguity(ambiguity elimination).Finally, the key information to be extracted was determined to complete the extraction model design of unstructured text key information in the knowledge database by solving the state sequence of maximum probability.The experimental results show that the model has an excellent extraction effect and short time-consuming.
作者
郭炜杰
包晓安
GUO WEI-jie;BAO Xiao-an(Zhejiang Sci-Tech University,Hangzhou Zhejiang 310018,China)
出处
《计算机仿真》
北大核心
2021年第9期357-360,394,共5页
Computer Simulation
基金
浙江省重点研发计划项目(2020C03094)。
关键词
知识数据库
非结构化
文本关键信息
信息抽取
隐马尔可夫模型
最大概率状态序列
Knowledge database
Unstructured
Key text information
Information extraction
Hidden Markov model
Maximum probabilistic state sequence