摘要
针对城市排水管道堵塞检测识别过程中有标签的样本数量较少,人工标注管道数据样本成本高昂,以及管道堵塞数据集中存在明显的类别不均衡问题,提出基于主动学习的方法以解决上述问题。同时,将极限随机树作为基分类器,对未标注样本集进行分类识别;样本查询策略选择将分类熵和余弦相似度相结合的样本采样策略。该方法使得模型在主动学习的过程中能够提高对少数类样本的关注度。试验结果在两个不同不均衡程度的数据集上进行验证,结果表明:笔者提出的主动学习模型在两个试验数据集上对少数类的分类识别效果都取得了较高的F1度量值,模型的分类稳定性并没有受到数据不均衡程度变化的干扰。
In view of the low number of labeled samples in the process of pipeline blockage detection and identification of urban drainage system,the high cost of manual marking pipeline data samples and obvious imbalance of categories exist in the data set of pipeline blockage,a method based on active learning to solve above-mentioned problems was proposed.At the same time,the limit random tree was used as the base classifier to classify and identify unlabeled samples.The sampling selection strategy combines the classification entropy and cosine similarity sampling strategy so that the attention of minority samples can be improved in process of active learning.The experimental results verified on two data sets with different degrees of imbalance showed that,the proposed active learning model achieves a high F1-value on the two experimental data sets,and the classification stability of the model is not disturbed by the imbalance degree of the data.
作者
王显龙
冯早
赵燕锋
WANG Xian-long;FENG Zao;ZHAO Yan-feng(Faculty of Information Engineering&Automation;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology)
出处
《化工自动化及仪表》
CAS
2021年第3期222-231,共10页
Control and Instruments in Chemical Industry
基金
国家自然科学基金项目(61563024,51765022)。
关键词
管道堵塞
数据不均衡
主动学习
分类熵
余弦相似度
极限随机树
pipeline blockage
imbalanced datasets
active learning
classifier entropy
cosine similarity
limit random tree