摘要
近年来,电力行业高质量发展与数字化转型工作的重要性逐步凸显,对电力标准的数字化转型研究提出新的需求,也为电力标准的管理、实施和监督带来新的挑战和机遇。电力领域作为社会经济发展的重要支撑,其术语和专有名词具有很高的特定性和复杂性,传统的基于规则与特征工程的命名实体识别方法在处理电力领域的标准文档时存在识别准确率低、术语难分割、依赖专家经验的局限性。为了克服这些问题,文章提出改进BERT的命名实体识别模型,通过引入领域内的电力术语语料库、词特征与词汇信息,在电力标准语料上对10种电力实体进行识别,F1达到了81%,实现对于电力领域长术语实体的有效识别,提高电力标准文档的处理效率和准确性,为电力标准的信息处理和应用提供支持。通过文章的研究能够促进电力标准文档的自动化处理能力,提高电力行业的数字化水平,为电力行业的规范制定、知识管理和决策支持等方面提供有力的技术支撑。
In recent years,the importance of high-quality development and digital transformation of the power industry has gradually become prominent,which puts forward new requirements for the digital transformation research of power standards,and also brings new challenges and opportunities for the management,implementation and supervision of power standards.As an important support for social and economic development,the terminology and proper nouns in the field of electric power have high specificity and complexity,and the traditional named entity recognition method based on rule and feature engineering has the limitations of low recognition accuracy,difficult to separate terms,and relying on expert experience when dealing with standard documents in the field of electric power.In order to overcome these problems,this paper proposes an improved BERT named entity recognition model.By introducing the power term corpus,word features and lexical information in the field,10 kinds of power entities are identified on the power standard corpus,and F1 reaches 81%,which realizes the effective identification of long term entities in the electric power field,improves the processing efficiency and accuracy of power standard documents,and provides support for the information processing and application of power standards.Through the research of this paper,it can promote the automatic processing ability of power standard documents,improve the digitalization level of the power industry,and provide strong technical support for the specification formulation,knowledge management and decision support of the power industry.
作者
贺馨仪
董明
颜拥
姚影
黄建平
HE Xinyi;DONG Ming;YAN Yong;YAO Ying;HUANG Jianping(State Key Laboratory of Electrical Insulation for Power Equipment,Xi’an Jiaotong University,Xi’an 710049,Shaanxi Province,China;Electric Power Research Institute,State Grid Zhejiang Electric Power Co.,Ltd.,Hangzhou 310007,Zhejiang Province,China)
出处
《电力信息与通信技术》
2024年第11期52-59,共8页
Electric Power Information and Communication Technology
基金
国家电网有限公司总部科技项目资助“国家电网公司标准数字化实现路径及关键技术研究”(5700-202241437A-2-0-ZN)。
关键词
命名实体识别
标准数字化
自然语言处理
电力标准
named entity recognition
standard digitization
natural language processing
power standards