摘要
获取大量电力领域文本数据后,由于网元链路业务命名规则不统一、业务人员表述差异等问题,会造成数据存在歧义、指代不明确等现象。为解决上述问题,提出一种基于深度序列匹配网络(Deep Sequential Matching Network, DSMN)的短文本实体链接算法,综合考虑实体指代项和候选实体间的内容和结构相似性,针对多源异构知识库实现高质量的消歧,支撑有效开展面向电力通信管理台帐以及网管数据的清洗校核工作。DSMN从多粒度对词进行全方位向量化表示,将实体指代项和句子中的每个词进行序列匹配,分别将候选实体与上层匹配结果进行序列匹配,通过卷积池化层提炼重要的匹配信息,通过动态平均算法计算实体指代项和候选实体之间的相似度。实验结果表明,DSMN在多个数据集上都展示了优异的实体链接能力。
After acquiring a large amount of text data in electricity domain, due to problems such as inconsistent naming rules for network element link services and differences in business personnel expressions, it will cause data ambiguity and ambiguous reference. To solve this problem, a short text entity linking algorithm based on Deep Sequential Matching Network(DSMN) is proposed, which comprehensively considers the content and structure similarity between entity reference items and candidate entities, achieves high-quality disambiguation for multi-source heterogeneous knowledge bases and supports the effective development of the cleaning and verification work for the electric communication management account and network management data. DSMN quantifies words from multi-granularity, and matches entity referents with each word in a sentence in sequence. Then, the important matching information is extracted by convolution pooling layer, and finally, the similarity between entity referents and candidate entities is calculated by dynamic average algorithm. The experimental results show that excellent entity linking capabilities was demonstrated on multiple datasets by DSMN.
作者
王亚男
任佳星
庞宇航
刘琼
潘娟
刘伟
高凯强
WANG Ya nan;REN Jiaxing;PANG Yuhang;LIU Qiong;PAN Juan;LIU Wei;GAO Kaiqiang(China Electric Power Research Institute Co.,Ltd.,Beijing 100192,China)
出处
《无线电工程》
北大核心
2023年第2期333-339,共7页
Radio Engineering
基金
电力CPS仿真输入数据处理技术研究(5242002000RX)。