摘要
在智慧工地项目安全管理过程中,为实现事故隐患排查信息的自动分类识别,提出了建筑事故隐患分类的Bert改进模型。该模型首先将术语多类别加权与单词嵌入方式相结合,其次对focal loss函数采用遗传算法优化类别权重_(αt代替)交叉熵损失函数,再者以Bert模型为基础构建了3种改进型分类算法,实现了隐患语料集的有效分类,最后采用3组算法对语料集进行对比验证。结果表明:ga_Bert+tfidf+focal模型在各隐患类别上的总体F_(1)分别高出其他3类模型5.9%、1.6%和0.66%,达到92.86%,对建筑事故隐患文本分类适用性较好。改进后的Bert模型解决了术语在不同类别标签的文档中具有不同重要性的问题,减缓了在多分类任务中各类别数据分布不均衡对模型分类性能的影响,为建筑企业项目安全管理智能化提供了理论支持。
To achieve the automatic classified identification of hidden danger information in intelligent construction sites,this paper proposes an improved Bert model to further improve the practicability and applicability of safety inspection notification.Firstly,the word embedding scheme is applied,allocating multiple category weighting of hidden danger to different terms.Secondly,the original cross-entropy loss function is replaced by the focal loss function that optimized category weight α_(t) by a genetic algorithm,aiming at adding the optimal weight to each hidden danger category.Furthermore,three improved classification algorithms are constructed based on the Bert model to achieve the effective classification of hidden danger corpus.Finally,612 safety inspection reports of a construction company over the past eight years are processed by data cleaning,denoising,and other manual preprocessing operations.As a result,the corpus noise such as special characters,useless information,and the mixture of SBC case and DBC case is removed.Then,the hidden dangers of accident categories are divided based on the standard specifications,and the two-way exchange data annotation is carried out.As a result,16033 text data set of building hidden danger containing 12 labels of hidden danger categories are created to compare and verify three groups of algorithms.The results show that:the F_(1) score of ga_Bert+TFIDF+focal model in each hidden danger category is higher than the 5.9%of Bert+enc,1.6%of Bert+foca,and 0.66%of ga_Bert+focal respectively,reaching 92.86%,which is better applicable for text classification of the hidden danger.The improved Bert model solves the problem that terminology attaches different importances to documents with different category labels,and reduces the impact of unbalanced data distribution on the classification performance of the model in the multi-classification task,which provides theoretical support for the intelligent project of safety management of construction enterprises.
作者
李华
陈俞源
高红
何思敏
乔峥元
LI Hua;CHEN Yu-yuan;GAO Hong;HE Si-min;QIAO Zheng-yuan(School of Resources Engineering,Xi’an University of Architecture and Technology,Xi'an 710055,China;The Northwest Company of China Construction Third Engineering Bureau,Xi'an710000,China)
出处
《安全与环境学报》
CAS
CSCD
北大核心
2022年第3期1421-1429,共9页
Journal of Safety and Environment
基金
西安建筑科技大学校基金自然科学专项(X20180011)。