摘要
针对数据不平衡导致的管网毛刺数据检测召回率偏低问题,提出一种Focal Loss改进LightGBM的管网毛刺数据检测方法。首先,结合管网毛刺数据的特点,针对性构造邻域相关特征。其次,将Focal Loss函数引入LightGBM,提高模型对难以检测的毛刺样本的权重,并对Focal Loss不同的参数取值进行实验,以平衡精确率与召回率。最后,选择不同参数的Focal Loss进行模型融合,进一步提升模型对不平衡毛刺数据的检测性能。在某市供水管网的真实数据上进行实验,结果表明,对比基于交叉熵损失函数的单一模型,本文提出的Focal Loss改进后的融合模型在毛刺数据上召回率和F1值的提升幅度达33.3和18个百分点,但毛刺数据的精确率还有待进一步提升。本文所提方法从损失函数入手,动态调整难易样本的权重,有效地提升了不平衡数据下的毛刺数据的检测性能。
Addressing the issue of low recall in the detection of burrs in water supply pipelines due to data imbalance,this paper proposes an improved method for detecting pipeline burr data by utilizing the Focal Loss function and integrating it with Light‐GBM.Firstly,considering the characteristics of pipeline burr data,neighborhood-related features are constructed.Secondly,the Focal Loss function is introduced into LightGBM to increase the model’s weight on hard-to-detect burr samples.Different pa‐rameter values for Focal Loss are experimented to balance precision and recall.Finally,different parameter settings for Focal Loss are selected for model fusion to further improve the detection performance of the model on imbalanced burr data.Experi‐ments are carried out on a real dataset from a municipal water supply pipeline.The experimental results show that,compared with a single model based on the cross-entropy loss function,the fused model with the improved Focal Loss in this paper achieves 33.3 percentage points increase in recall and 18 percentage points increase in F1 score for burr data.However,the pre‐cision of burr data detection still needs further improvement.The method proposed in this paper starts with loss function and dy‐namically adjusts the weights of difficult and easy samples to effectively improve the detection performance of burr data under un‐balanced data.
作者
薛浩
马静
郭小宇
XUE Hao;MA Jing;GUO Xiaoyu(College of Economics and Management,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
出处
《计算机与现代化》
2024年第9期74-81,90,共9页
Computer and Modernization
基金
国家自然科学基金面上项目(72174086)。