摘要
为实现非结构化工艺规程文本中关键信息的高效识别,建立一种基于机加工领域词典和神经网络的命名实体识别模型.首先,结合机加工领域词典与jieba分词技术进行数据集的自动标注,并在对工艺参数信息进行标注的过程中将数字和标志字母划分为一个分词单位以增强后续特征提取效果;其次,在word2vec词嵌入的基础上,采用双向长短时记忆网络对文本进行特征提取;最后,采用条件随机场综合上下文逻辑以提高关键工艺信息的识别准确率.在包含431条工步内容的数据集上,对所提模型的识别效果进行实验,结果表明,所提模型的准确率、召回率和F1值分别为90.20%,93.88%和92.00%,在与领域内传统模型的对比上具有一定优势,并使用3个不同工艺规程数据集验证了该模型的鲁棒性.
To realize the efficient recognition of critical information in unstructured process planning text,a named entity recognition model based on technology dictionary and neural network is established.Firstly,the technology dictionary and jieba word segmentation technology are comprehensively combined to realize automatic annotation of datasets,especially,the number and its identification letters are recognized as one unit in the automatic annotation of process parameter data,which enhances the effect of subsequent feature extraction.Secondly,the bidirectional long short term memory network is used to extract the feature of text information based on word2vec.Finally,conditional random field model is used to synthesize contextual logic to improve the recognition accuracy of critical process information.To verify the effectiveness of the proposed model,431 work steps are utilized as training sample.Experimental results show that the values of accuracy rate,recall and F1 are 90.20%,93.88%and 92.00%respectively,which has certain advantages compared with traditional models in the field.In addition,three experimental datasets from different tech-nology books are tested,the results also show high robustness of the proposed model.
作者
董含笑
李豫虎
乔立红
黄志成
Dong Hanxiao;Li Yuhu;Qiao Lihong;Huang Zhicheng(School of Mechanical Engineering&Automation,Beihang University,Beijing 100191)
出处
《计算机辅助设计与图形学学报》
EI
CSCD
北大核心
2024年第2期313-320,共8页
Journal of Computer-Aided Design & Computer Graphics
基金
国家重点研发计划.
关键词
双向长短时记忆网络
条件随机场
命名实体识别
知识抽取
bidirectional long short term memory network
conditional random field
named entity recognition
knowledge extraction