摘要
机车检修数据标准化是确保机车检修数据进行以可靠性为中心的维修(RCM)数据分析的关键步骤之一,但目前存在样本数量少、数据格式不规范、分析难、人工成本高等问题,导致使用传统的人工方式进行数据标准化的工作难度大。大语言模型在理解自然语言处理和处理复杂任务方面具有强大的性能,近几年来在学术界和工业界取得了很大的发展。文章首先研究了大语言模型在机车检修数据上进行信息抽取的应用性能,发现统一信息抽取大语言模型(UIE)适用于机车检修领域进行信息抽取,扩大机车数据数量可以提升UIE对机车检修数据信息抽取的性能,平衡故障标签种类对UIE进行机车检修数据信息抽取的性能提升不大;然后,针对数据标注困难的问题,通过采用编写脚本方式,实现数据标注自动化,并利用ChatGLM对机车检修数据进行标准化处理,Bleu-4、Rouge-1、Rouge-2、Rouge-L指标分别达到86.87%、89.60%、87.54%、94.26%,满足工程应用要求;最后,开发了一个封装大语言模型的辅助数据标准化预处理工具,简化了数据标准化流程。
Standardization is one of the key steps to analyze locomotive overhaul data with a focus on reliability-centered maintenance(RCM).However,traditional manual methods encounter challenges such as small sample sizes,non-standardized data formats,analytical complexities,and high labour costs,hindering the achievement of data standardization.Large language models(LLM),featuring powerful performance in natural language processing comprehension and handling complex tasks,have made great academic and industrial progress in recent years.This study initially investigated the application performance of LLMs in information extraction from locomotive overhaul data,with the following three reveals,as the universal information extraction(UIE)LLM is suitable for information extraction in the field of locomotive overhaul;expanding the size of locomotive data helps improve the UIE performance in information extraction from locomotive overhaul data;balancing the types of fault labels does not notably help improve this performance.Subsequent explorations concentrated on difficulties in data annotation.The script writing method was utilized for automated annotation of data,and ChatGLM was leveraged to standardize locomotive overhaul data,yielding Bleu-4,Rouge-1,Rouge-2,and Rouge-L metrics of 86.87%,89.60%,87.54%,and 94.26%,respectively,in alignment with the requirements of engineering applications.Further developments introduced an auxiliary data standardization pre-processing tool to streamline the standardization process by encapsulating the LLM.
作者
陈傲
李晨
颜家云
彭联贴
田野
刘雷新元
CHEN Ao;LI Chen;YAN Jiayun;PENG Liantie;TIAN Ye;LIU Leixinyuan(Zhuzhou CRRC Time Electric Co.,Ltd.,Zhuzhou,Hunan 412001,China)
出处
《控制与信息技术》
2024年第3期72-79,共8页
CONTROL AND INFORMATION TECHNOLOGY
基金
湖南省科技创新重点研发项目(2023GK2095)。