摘要
针对工业生产过程中有标签样本少而人工标记代价高的问题,提出一种基于双层优选策略的主动学习算法。首先,建立不同预测模型对无标签样本的信息量进行评估;其次,充分考虑样本的分布信息,从样本的不确定性、差异性和代表性3个角度出发,提出新的评价指标,优选无标签样本,并去除冗余信息;最后,对双层优选的样本进行人工标记,重构有标签样本集后进行建模应用。通过脱丁烷塔的工业过程数据进行算法的应用仿真,验证了所提算法的有效性与性能。
Aiming at the problem that the number of label samples is small and the cost of manual labeling is high in the industrial production process,an active learning algorithm based on a two-tier optimization strategy is proposed.First,establish different prediction models to evaluate the amount of information contained in unlabeled samples;secondly,fully consider the distribution information of the samples and,from the three perspectives of sample uncertainty,difference,and representativeness,propose new evaluation indicators,preferably unlabeled samples,and remove redundant information;finally,the double-layered preferred samples are manually labeled,and the labeled sample set is reconstructed for modeling application.The application simulation of the algorithm through the industrial process data of the debutanizer verifies the effectiveness and performance of the proposed algorithm.
作者
周博文
熊伟丽
ZHOU Bowen;XIONG Weili(School of Internet of Things Engineering,Jiangnan University,Wuxi 214122,China;Key Laboratory of Advanced Process Control for Light Industry(Ministry of Education),Jiangnan University,Wuxi 214122,China)
出处
《智能系统学报》
CSCD
北大核心
2022年第4期688-697,共10页
CAAI Transactions on Intelligent Systems
基金
国家自然科学基金项目(61773182)
国家重点研发计划子项目(2018YFC1603705-03)。
关键词
主动学习
双层优选
不确定性
分布信息
评价指标
冗余信息
建模应用
脱丁烷塔
active learning
two-tier optimization
sample uncertainty
distribution information
evaluation indicator
redundant information
modeling application
debutanizer