摘要
作为我国桥梁工程领域最重要的数据源之一,桥梁检测文本蕴含了丰富的结构构件参数及检测病害描述等关键业务信息,但面向该领域的文本信息抽取研究尚未有效开展。该文在阐明其领域命名实体识别目标任务的基础上,分析了待识别实体在蕴含大量专业术语的同时,存在地名或路线名嵌套、字符多义、上下文位置相关和方向敏感等领域特性。鉴于此,该文提出一种基于Transformer-BiLSTM-CRF的桥梁检测领域命名实体识别方法。首先,利用Transformer编码器对检测文本字符序列的上下文长距离位置依赖特征进行建模,并采用BiLSTM网络进一步捕获方向敏感性特征,最终在CRF模型中实现标注序列预测。实验结果表明,相较于当前主流的命名实体识别模型,该文提出的方法具有更好的综合识别效果。
The information extraction for bridge inspection reports is a less addressed issue, which contain a large amount of key business information such as structural component parameters and inspection description. Clarifying the task of named entity recognition in this field, this paper also reveals the characteristics of the entities to be identified, such as location name or route name nesting, character ambiguity, context location correlation and direction sensitivity. A bridge inspection named entity recognition approach is then proposed based on Transformer-BiLSTM-CRF. First, the Transformer encoder is used to model the long-distance position-dependent features of text sequences, and the BiLSTM network is adopted to further capture the direction-sensitive features. Finally, the labeled sequence prediction is implemented via the CRF model. The experimental results show that, compared with the mainstream named entity recognition models, the proposed model achieves better performance.
作者
李韧
李童
杨建喜
莫天金
蒋仕新
李东
LI Ren;LI Tong;YANG Jianxi;MO Tianjin;JIANG Shixin;LI Dong(College of Information Science and Engineering,Chongqing Jiaotong University,Chongqing 400074,China)
出处
《中文信息学报》
CSCD
北大核心
2021年第4期83-91,共9页
Journal of Chinese Information Processing
基金
国家自然科学基金(51608070)
重庆市教委科学技术研究项目(KJQN201800705,KJQN201900726)
重庆交通大学国家自然科学基金(2018PY34)。