期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
An ensemble machine learning model to uncover potential sites of hazardous waste illegal dumping based on limited supervision experience
1
作者 Jinghua Geng Yimeng Ding +5 位作者 Wenjun Xie Wen Fang Miaomiao Liu Zongwei Ma Jianxun Yang Jun Bi 《Fundamental Research》 CAS CSCD 2024年第4期972-978,共7页
With the soaring generation of hazardous waste(HW)during industrialization and urbanization,HW illegal dumping continues to be an intractable global issue.Particularly in developing regions with lax regulations,it has... With the soaring generation of hazardous waste(HW)during industrialization and urbanization,HW illegal dumping continues to be an intractable global issue.Particularly in developing regions with lax regulations,it has become a major source of soil and groundwater contamination.One dominant challenge for HW illegal dumping supervision is the invisibility of dumping sites,which makes HW illegal dumping difficult to be found,thereby causing a long-term adverse impact on the environment.How to utilize the limited historic supervision records to screen the potential dumping sites in the whole region is a key challenge to be addressed.In this study,a novel machine learning model based on the positive-unlabeled(PU)learning algorithm was proposed to resolve this problem through the ensemble method which could iteratively mine the features of limited historic cases.Validation of the random forest-based PU model showed that the predicted top 30%of high-risk areas could cover 68.1%of newly reported cases in the studied region,indicating the reliability of the model prediction.This novel framework will also be promising in other environmental management scenarios to deal with numerous unknown samples based on limited prior experience. 展开更多
关键词 Hazardous waste Illegal dumping site positive-unlabeled machine learning Probability prediction Model interpretation
原文传递
使用少量有标签样本学习的方法
2
作者 熊智翔 陆青 王胤 《计算机应用》 CSCD 北大核心 2018年第A02期11-15,41,共6页
随着网络的普及,网络上产生了越来越多的数据,但是在实际生产的时候,会发现这些数据大部分都不会被打上标签;而要进行数据挖掘的任务,监督型学习算法要求有足够的标签才能进行训练。针对样本缺少标签的问题,提出并实现了正样本-无标签... 随着网络的普及,网络上产生了越来越多的数据,但是在实际生产的时候,会发现这些数据大部分都不会被打上标签;而要进行数据挖掘的任务,监督型学习算法要求有足够的标签才能进行训练。针对样本缺少标签的问题,提出并实现了正样本-无标签样本学习的方法。第一种方法首先对没有标签的样本进行评估,用评估值将样本打上标签,然后利用这些标签训练出一个模型。第二种方法通过对样本权重的把控,达到利用大量数据中信息的目的。实验结果表明,这两种方法的效果与之前的方法相似甚至能超过之前的算法,而且实现起来更加简单。 展开更多
关键词 弱监督学习 positive-unlabeled学习 异常检测 机器学习 数据挖掘
下载PDF
Self-Learning of Multivariate Time Series Using Perceptually Important Points 被引量:2
3
作者 Timo Lintonen Tomi Raty 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2019年第6期1318-1331,共14页
In machine learning,positive-unlabelled(PU)learning is a special case within semi-supervised learning.In positiveunlabelled learning,the training set contains some positive examples and a set of unlabelled examples fr... In machine learning,positive-unlabelled(PU)learning is a special case within semi-supervised learning.In positiveunlabelled learning,the training set contains some positive examples and a set of unlabelled examples from both the positive and negative classes.Positive-unlabelled learning has gained attention in many domains,especially in time-series data,in which the obtainment of labelled data is challenging.Examples which originate from the negative class are especially difficult to acquire.Self-learning is a semi-supervised method capable of PU learning in time-series data.In the self-learning approach,observations are individually added from the unlabelled data into the positive class until a stopping criterion is reached.The model is retrained after each addition with the existent labels.The main problem in self-learning is to know when to stop the learning.There are multiple,different stopping criteria in the literature,but they tend to be inaccurate or challenging to apply.This publication proposes a novel stopping criterion,which is called Peak evaluation using perceptually important points,to address this problem for time-series data.Peak evaluation using perceptually important points is exceptional,as it does not have tunable hyperparameters,which makes it easily applicable to an unsupervised setting.Simultaneously,it is flexible as it does not make any assumptions on the balance of the dataset between the positive and the negative class. 展开更多
关键词 positive-unlabelled(PU) learning SELF-LEARNING stopping criterion time series
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部