摘要
开源情报是现代情报体系中的关键组成部分,对其的处理分析是情报工作的重要内容。这其中以开源文本数据最为海量,蕴含的信息最丰富,但其以半结构化、非结构化数据为主,处理起来十分复杂,因而一种低成本的自动化文本情报处理方法十分关键。提出了一种基于远程监督关系抽取的开源文本自动化处理框架,通过将实体对齐到远程数据库的方式自动生成训练数据,降低了人工标注成本。同时,还提出了一种结合BiGRU网络和双重注意力机制的关系抽取模型,有效提升了抽取的性能,降低了噪声数据的影响。最后,在中文人物关系自动标注数据集上进行了实验。模型的抽取性能相较于基线模型有显著提升,初步验证了模型的有效性,能较好地支持对开源情报信息的分析。
Open source intelligence is a key component of modern intelligence system,and its analysis is an important part of intelligence work.Among them,the open source text data are most massive and valuable,but are hard to process when they are mainly semi-structured and unstructured data.Therefore,a low-cost automated text information processing method is critical.This paper proposes an open source text intelligence automated processing framework,which is based on distant supervised relation extraction.It automatically generates training data by aligning entities to a remote database,reducing the cost of manual labeling.Moreover,a relation extraction model combining BiGRU network and dual attention mechanism is proposed,which can effectively improve the extraction performance and reduce the influence of noisy data.Finally,an experiment is organized on the Chinese person relation dataset,and the results show that the extraction performance of our model is significantly improved compared to the baseline models.The results show that the model is effective and can support the analysis of open source intelligence information.
作者
赵国清
何佳洲
乔慧
李永盛
王景石
ZHAO Guo-qing;HE Jia-zhou;QIAO Hui;LI Yong-sheng;WANG Jing-shi(Jiangsu Automation Research Institute, Lianyungang 222061, China)
出处
《指挥控制与仿真》
2021年第1期69-73,共5页
Command Control & Simulation
关键词
开源情报
关系抽取
远程监督
open source information
relation extraction
distant supervision