摘要
地方志承载了当地丰富且悠久的历史、文化和思想,本文以雄安县志为例,对其中所记录的作物物产名称和信息中的7大类实体进行自动识别和抽取,为后续方志古籍知识库、智能问答系统构建提供基础。对雄安县志物产节进行数据预处理后,分别利用条件随机场(Conditional Random Fields,CRF)、Bi-RNN和Bi-LSTM-CRF在该语料上进行实体识别的实验研究,并对最终结果进行对比分析。在全部语料上训练得到的Bi-LSTM-CRF模型的准确率和召回率分别达到了82.27%和88.12%,证明了与单一学习模型相比,融合机器学习与深度学习的模型在实体识别任务中有更好的表现,能够为大规模古籍文本的智能化处理与深度挖掘提供借鉴。
Local chronicles carry the local rich and long history, culture and thought. Taking xiong’an County chronicles as an example, this paper automatically identifies and extracts seven categories of entities in the crop names and information recorded in them, so as to provide the basis for the construction of follow-up local chronicles ancient book knowledge base and intelligent question and answer system. After preprocessing the data of xiongan County chronicles and products Festival, the experimental research of entity recognition is carried out on the corpus by using CRF, Bi-RNN and Bi-LSTM-CRF respectively, and the final results are compared and analyzed. The accuracy and recall of Bi-LSTM-CRF model trained on all corpora reached 82.27% and 88.12%respectively, which proved that the model integrating machine learning and deep learning had better performance in entity recognition task than the single learning model, and provided a reference for intelligent processing and deep mining of large-scale ancient texts.
作者
任常青
REN Changqing(School of Management,Hebei University,Baoding Hebei 071002,China)
出处
《信息与电脑》
2022年第1期74-76,共3页
Information & Computer
关键词
方志古籍
机器学习
深度学习
实体识别
local chronicles
machine learning
deep learning
entity recognition