期刊文献+
共找到328篇文章
< 1 2 17 >
每页显示 20 50 100
Class Imbalanced Problem:Taxonomy,Open Challenges,Applications and State-of-the-Art Solutions
1
作者 Khursheed Ahmad Bhat Shabir Ahmad Sofi 《China Communications》 SCIE CSCD 2024年第11期216-242,共27页
The study of machine learning has revealed that it can unleash new applications in a variety of disciplines.Many limitations limit their expressiveness,and researchers are working to overcome them to fully exploit the... The study of machine learning has revealed that it can unleash new applications in a variety of disciplines.Many limitations limit their expressiveness,and researchers are working to overcome them to fully exploit the power of data-driven machine learning(ML)and deep learning(DL)techniques.The data imbalance presents major hurdles for classification and prediction problems in machine learning,restricting data analytics and acquiring relevant insights in practically all real-world research domains.In visual learning,network information security,failure prediction,digital marketing,healthcare,and a variety of other domains,raw data suffers from a biased data distribution of one class over the other.This article aims to present a taxonomy of the approaches for handling imbalanced data problems and their comparative study on the classification metrics and their application areas.We have explored very recent trends of techniques employed for solutions to class imbalance problems in datasets and have also discussed their limitations.This article has also identified open challenges for further research in the direction of class data imbalance. 展开更多
关键词 class imbalance classification deep learning GANs sampling
下载PDF
Combined Effect of Concept Drift and Class Imbalance on Model Performance During Stream Classification
2
作者 Abdul Sattar Palli Jafreezal Jaafar +3 位作者 Manzoor Ahmed Hashmani Heitor Murilo Gomes Aeshah Alsughayyir Abdul Rehman Gilal 《Computers, Materials & Continua》 SCIE EI 2023年第4期1827-1845,共19页
Every application in a smart city environment like the smart grid,health monitoring, security, and surveillance generates non-stationary datastreams. Due to such nature, the statistical properties of data changes over... Every application in a smart city environment like the smart grid,health monitoring, security, and surveillance generates non-stationary datastreams. Due to such nature, the statistical properties of data changes overtime, leading to class imbalance and concept drift issues. Both these issuescause model performance degradation. Most of the current work has beenfocused on developing an ensemble strategy by training a new classifier on thelatest data to resolve the issue. These techniques suffer while training the newclassifier if the data is imbalanced. Also, the class imbalance ratio may changegreatly from one input stream to another, making the problem more complex.The existing solutions proposed for addressing the combined issue of classimbalance and concept drift are lacking in understating of correlation of oneproblem with the other. This work studies the association between conceptdrift and class imbalance ratio and then demonstrates how changes in classimbalance ratio along with concept drift affect the classifier’s performance.We analyzed the effect of both the issues on minority and majority classesindividually. To do this, we conducted experiments on benchmark datasetsusing state-of-the-art classifiers especially designed for data stream classification.Precision, recall, F1 score, and geometric mean were used to measure theperformance. Our findings show that when both class imbalance and conceptdrift problems occur together the performance can decrease up to 15%. Ourresults also show that the increase in the imbalance ratio can cause a 10% to15% decrease in the precision scores of both minority and majority classes.The study findings may help in designing intelligent and adaptive solutionsthat can cope with the challenges of non-stationary data streams like conceptdrift and class imbalance. 展开更多
关键词 classIFICATION data streams class imbalance concept drift class imbalance ratio
下载PDF
Attenuate Class Imbalance Problem for Pneumonia Diagnosis Using Ensemble Parallel Stacked Pre-Trained Models
3
作者 Aswathy Ravikumar Harini Sriraman 《Computers, Materials & Continua》 SCIE EI 2023年第4期891-909,共19页
Pneumonia is an acute lung infection that has caused many fatalitiesglobally. Radiologists often employ chest X-rays to identify pneumoniasince they are presently the most effective imaging method for this purpose.Com... Pneumonia is an acute lung infection that has caused many fatalitiesglobally. Radiologists often employ chest X-rays to identify pneumoniasince they are presently the most effective imaging method for this purpose.Computer-aided diagnosis of pneumonia using deep learning techniques iswidely used due to its effectiveness and performance. In the proposed method,the Synthetic Minority Oversampling Technique (SMOTE) approach is usedto eliminate the class imbalance in the X-ray dataset. To compensate forthe paucity of accessible data, pre-trained transfer learning is used, and anensemble Convolutional Neural Network (CNN) model is developed. Theensemble model consists of all possible combinations of the MobileNetv2,Visual Geometry Group (VGG16), and DenseNet169 models. MobileNetV2and DenseNet169 performed well in the Single classifier model, with anaccuracy of 94%, while the ensemble model (MobileNetV2+DenseNet169)achieved an accuracy of 96.9%. Using the data synchronous parallel modelin Distributed Tensorflow, the training process accelerated performance by98.6% and outperformed other conventional approaches. 展开更多
关键词 Pneumonia prediction distributed deep learning data parallel model ensemble deep learning class imbalance skewed data
下载PDF
Ensemble-based active learning for class imbalance problem 被引量:1
4
作者 Yanping Yang Guangzhi Ma 《Journal of Biomedical Science and Engineering》 2010年第10期1022-1029,共8页
In medical diagnosis, the problem of class imbalance is popular. Though there are abundant unlabeled data, it is very difficult and expensive to get labeled ones. In this paper, an ensemble-based active learning algor... In medical diagnosis, the problem of class imbalance is popular. Though there are abundant unlabeled data, it is very difficult and expensive to get labeled ones. In this paper, an ensemble-based active learning algorithm is proposed to address the class imbalance problem. The artificial data are created according to the distribution of the training dataset to make the ensemble diverse, and the random subspace re-sampling method is used to reduce the data dimension. In selecting member classifiers based on misclassification cost estimation, the minority class is assigned with higher weights for misclassification costs, while each testing sample has a variable penalty factor to induce the ensemble to correct current error. In our experiments with UCI disease datasets, instead of classification accuracy, F-value and G-means are used as the evaluation rule. Compared with other ensemble methods, our method shows best performance, and needs less labeled samples. 展开更多
关键词 class imbalance Active learning ENSEMBLE RANDOM SUBSPACE MISclassIFICATION COST
下载PDF
BLS-identification:A device fingerprint classification mechanism based on broad learning for Internet of Things
5
作者 Yu Zhang Bei Gong Qian Wang 《Digital Communications and Networks》 SCIE CSCD 2024年第3期728-739,共12页
The popularity of the Internet of Things(IoT)has enabled a large number of vulnerable devices to connect to the Internet,bringing huge security risks.As a network-level security authentication method,device fingerprin... The popularity of the Internet of Things(IoT)has enabled a large number of vulnerable devices to connect to the Internet,bringing huge security risks.As a network-level security authentication method,device fingerprint based on machine learning has attracted considerable attention because it can detect vulnerable devices in complex and heterogeneous access phases.However,flexible and diversified IoT devices with limited resources increase dif-ficulty of the device fingerprint authentication method executed in IoT,because it needs to retrain the model network to deal with incremental features or types.To address this problem,a device fingerprinting mechanism based on a Broad Learning System(BLS)is proposed in this paper.The mechanism firstly characterizes IoT devices by traffic analysis based on the identifiable differences of the traffic data of IoT devices,and extracts feature parameters of the traffic packets.A hierarchical hybrid sampling method is designed at the preprocessing phase to improve the imbalanced data distribution and reconstruct the fingerprint dataset.The complexity of the dataset is reduced using Principal Component Analysis(PCA)and the device type is identified by training weights using BLS.The experimental results show that the proposed method can achieve state-of-the-art accuracy and spend less training time than other existing methods. 展开更多
关键词 Device fingerprint Traffic analysis class imbalance Broad learning system Access authentication
下载PDF
MCBC-SMOTE:A Majority Clustering Model for Classification of Imbalanced Data
6
作者 Jyoti Arora Meena Tushir +4 位作者 Keshav Sharma Lalit Mohan Aman Singh Abdullah Alharbi Wael Alosaimi 《Computers, Materials & Continua》 SCIE EI 2022年第12期4801-4817,共17页
Datasets with the imbalanced class distribution are difficult to handle with the standard classification algorithms.In supervised learning,dealing with the problem of class imbalance is still considered to be a challe... Datasets with the imbalanced class distribution are difficult to handle with the standard classification algorithms.In supervised learning,dealing with the problem of class imbalance is still considered to be a challenging research problem.Various machine learning techniques are designed to operate on balanced datasets;therefore,the state of the art,different undersampling,over-sampling and hybrid strategies have been proposed to deal with the problem of imbalanced datasets,but highly skewed datasets still pose the problem of generalization and noise generation during resampling.To overcome these problems,this paper proposes amajority clusteringmodel for classification of imbalanced datasets known as MCBC-SMOTE(Majority Clustering for balanced Classification-SMOTE).The model provides a method to convert the problem of binary classification into a multi-class problem.In the proposed algorithm,the number of clusters for themajority class is calculated using the elbow method and the minority class is over-sampled as an average of clustered majority classes to generate a symmetrical class distribution.The proposed technique is cost-effective,reduces the problem of noise generation and successfully disables the imbalances present in between and within classes.The results of the evaluations on diverse real datasets proved to provide better classification results as compared to state of the art existing methodologies based on several performance metrics. 展开更多
关键词 imbalance class problem classIFICATION SMOTE K-MEANS CLUSTERING sampling
下载PDF
Handling Class Imbalance in Online Transaction Fraud Detection
7
作者 Kanika Jimmy Singla +3 位作者 Ali Kashif Bashir Yunyoung Nam Najam UI Hasan Usman Tariq 《Computers, Materials & Continua》 SCIE EI 2022年第2期2861-2877,共17页
With the rise of internet facilities,a greater number of people have started doing online transactions at an exponential rate in recent years as the online transaction system has eliminated the need of going to the ba... With the rise of internet facilities,a greater number of people have started doing online transactions at an exponential rate in recent years as the online transaction system has eliminated the need of going to the bank physically for every transaction.However,the fraud cases have also increased causing the loss of money to the consumers.Hence,an effective fraud detection system is the need of the hour which can detect fraudulent transactions automatically in real-time.Generally,the genuine transactions are large in number than the fraudulent transactions which leads to the class imbalance problem.In this research work,an online transaction fraud detection system using deep learning has been proposed which can handle class imbalance problem by applying algorithm-level methods which modify the learning of the model to focus more on the minority class i.e.,fraud transactions.A novel loss function named Weighted Hard-Reduced Focal Loss(WH-RFL)has been proposed which has achieved maximum fraud detection rate i.e.,True PositiveRate(TPR)at the cost of misclassification of few genuine transactions as high TPR is preferred over a high True Negative Rate(TNR)in fraud detection system and same has been demonstrated using three publicly available imbalanced transactional datasets.Also,Thresholding has been applied to optimize the decision threshold using cross-validation to detect maximum number of frauds and it has been demonstrated by the experimental results that the selection of the right thresholding method with deep learning yields better results. 展开更多
关键词 class imbalance deep learning fraud detection loss function THRESHOLDING
下载PDF
An Improved Algorithm for Imbalanced Data and Small Sample Size Classification
8
作者 Yong Hu Dongfa Guo +7 位作者 Zengwei Fan Chen Dong Qiuhong Huang Shengkai Xie Guifang Liu Jing Tan Boping Li Qiwei Xie 《Journal of Data Analysis and Information Processing》 2015年第3期27-33,共7页
Traditional classification algorithms perform not very well on imbalanced data sets and small sample size. To deal with the problem, a novel method is proposed to change the class distribution through adding virtual s... Traditional classification algorithms perform not very well on imbalanced data sets and small sample size. To deal with the problem, a novel method is proposed to change the class distribution through adding virtual samples, which are generated by the windowed regression over-sampling (WRO) method. The proposed method WRO not only reflects the additive effects but also reflects the multiplicative effect between samples. A comparative study between the proposed method and other over-sampling methods such as synthetic minority over-sampling technique (SMOTE) and borderline over-sampling (BOS) on UCI datasets and Fourier transform infrared spectroscopy (FTIR) data set is provided. Experimental results show that the WRO method can achieve better performance than other methods. 展开更多
关键词 class imbalance Learning OVER-SAMPLING HIGH-DIMENSIONAL Small-Sample SIZE Support VECTOR Machine
下载PDF
Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions
9
作者 Minh Thanh Vo Anh H.Vo +2 位作者 Trang Nguyen Rohit Sharma Tuong Le 《Computers, Materials & Continua》 SCIE EI 2021年第7期521-535,共15页
In recent years,the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age.Identifying fraud in job ... In recent years,the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age.Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting.However,the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs.This causes a reduction in the predictability and performance of traditional machine learning models.We therefore present an efficient framework that uses an oversampling technique called FJD-OT(Fake Job Description Detection Using Oversampling Techniques)to improve the predictability of detecting fake job descriptions.In the proposed framework,we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module.We then use a bag of words in combination with the term frequency-inverse document frequency(TF-IDF)approach to extract the features from the text data to create the feature dataset in the second module.Next,our framework applies k-fold cross-validation,a commonly used technique to test the effectiveness of machine learning models,that splits the experimental dataset[the Employment Scam Aegean(ESA)dataset in our study]into training and test sets for evaluation.The training set is passed through the third module,an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module.The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics. 展开更多
关键词 Fake job description detection class imbalance problem oversampling techniques
下载PDF
CMAGAN:classifier-aided minority augmentation generative adversarial networks for industrial imbalanced data and its application to fault prediction
10
作者 Wen-Jie Wang Zhao Liu Ping Zhu 《Advances in Manufacturing》 SCIE EI CAS CSCD 2024年第3期603-618,共16页
Class imbalance is a common characteristic of industrial data that adversely affects industrial data mining because it leads to the biased training of machine learning models.To address this issue,the augmentation of ... Class imbalance is a common characteristic of industrial data that adversely affects industrial data mining because it leads to the biased training of machine learning models.To address this issue,the augmentation of samples in minority classes based on generative adversarial networks(GANs)has been demonstrated as an effective approach.This study proposes a novel GAN-based minority class augmentation approach named classifier-aided minority augmentation generative adversarial network(CMAGAN).In the CMAGAN framework,an outlier elimination strategy is first applied to each class to minimize the negative impacts of outliers.Subsequently,a newly designed boundary-strengthening learning GAN(BSLGAN)is employed to generate additional samples for minority classes.By incorporating a supplementary classifier and innovative training mechanisms,the BSLGAN focuses on learning the distribution of samples near classification boundaries.Consequently,it can fully capture the characteristics of the target class and generate highly realistic samples with clear boundaries.Finally,the new samples are filtered based on the Mahalanobis distance to ensure that they are within the desired distribution.To evaluate the effectiveness of the proposed approach,CMAGAN was used to solve the class imbalance problem in eight real-world fault-prediction applications.The performance of CMAGAN was compared with that of seven other algorithms,including state-of-the-art GAN-based methods,and the results indicated that CMAGAN could provide higher-quality augmented results. 展开更多
关键词 class imbalance Minority class augmentation Generative adversarial network(GAN) Boundary strengthening learning(BSL) Fault prediction
原文传递
非平衡概念漂移数据流主动学习方法
11
作者 李艳红 王甜甜 +1 位作者 王素格 李德玉 《自动化学报》 EI CAS CSCD 北大核心 2024年第3期589-606,共18页
数据流分类研究在开放、动态环境中如何提供更可靠的数据驱动预测模型,关键在于从实时到达且不断变化的数据流中检测并适应概念漂移.目前,为检测概念漂移和更新分类模型,数据流分类方法通常假设所有样本的标签都是已知的,这一假设在真... 数据流分类研究在开放、动态环境中如何提供更可靠的数据驱动预测模型,关键在于从实时到达且不断变化的数据流中检测并适应概念漂移.目前,为检测概念漂移和更新分类模型,数据流分类方法通常假设所有样本的标签都是已知的,这一假设在真实场景下是不现实的.此外,真实数据流可能表现出较高且不断变化的类不平衡比率,会进一步增加数据流分类任务的复杂性.为此,提出一种非平衡概念漂移数据流主动学习方法 (Active learning method for imbalanced concept drift data stream, ALM-ICDDS).定义基于多预测概率的样本预测确定性度量,提出边缘阈值矩阵的自适应调整方法,使得标签查询策略适用于类别数较多的非平衡数据流;提出基于记忆强度的样本替换策略,将难区分、少数类样本和代表当前数据分布的样本保存在记忆窗口中,提升新基分类器的分类性能;定义基于分类精度的基分类器重要性评价及更新方法,实现漂移后的集成分类器更新.在7个合成数据流和3个真实数据流上的对比实验表明,提出的非平衡概念漂移数据流主动学习方法的分类性能优于6种概念漂移数据流学习方法. 展开更多
关键词 数据流分类 主动学习 概念漂移 多类不平衡
下载PDF
非平衡数据流在线主动学习方法
12
作者 李艳红 任霖 +1 位作者 王素格 李德玉 《自动化学报》 EI CAS CSCD 北大核心 2024年第7期1389-1401,共13页
数据流分类是数据流挖掘领域一项重要研究任务,目标是从不断变化的海量数据中捕获变化的类结构.目前,几乎没有框架可以同时处理数据流中常见的多类非平衡、概念漂移、异常点和标记样本成本高昂问题.基于此,提出一种非平衡数据流在线主... 数据流分类是数据流挖掘领域一项重要研究任务,目标是从不断变化的海量数据中捕获变化的类结构.目前,几乎没有框架可以同时处理数据流中常见的多类非平衡、概念漂移、异常点和标记样本成本高昂问题.基于此,提出一种非平衡数据流在线主动学习方法(Online active learning method for imbalanced data stream,OALM-IDS).AdaBoost是一种将多个弱分类器经过迭代生成强分类器的集成分类方法,AdaBoost.M2引入了弱分类器的置信度,此类方法常用于静态数据.定义了基于非平衡比率和自适应遗忘因子的训练样本重要性度量,从而使AdaBoost.M2方法适用于非平衡数据流,提升了非平衡数据流集成分类器的性能.提出了边际阈值矩阵的自适应调整方法,优化了标签请求策略.将概念漂移程度融入模型构建过程中,定义了基于概念漂移指数的自适应遗忘因子,实现了漂移后的模型重构.在6个人工数据流和4个真实数据流上的对比实验表明,提出的非平衡数据流在线主动学习方法的分类性能优于其他5种非平衡数据流学习方法. 展开更多
关键词 主动学习 数据流分类 多类非平衡 概念漂移
下载PDF
改进的采样算法与无监督聚类相结合的软件缺陷预测模型
13
作者 石海鹤 周世文 +1 位作者 钟林辉 肖正兴 《江西师范大学学报(自然科学版)》 CAS 北大核心 2024年第3期301-310,共10页
该文首先在自适应综合过采样算法ADASYN(adaptive synthetic sampling)的基础上,考虑少数类内部不同密度簇之间的连接性问题,将与采样点距离为中等的点纳入新样本生成范围,改进得到T-ADASYN过采样优化算法,有效地增加了少数类内部不同... 该文首先在自适应综合过采样算法ADASYN(adaptive synthetic sampling)的基础上,考虑少数类内部不同密度簇之间的连接性问题,将与采样点距离为中等的点纳入新样本生成范围,改进得到T-ADASYN过采样优化算法,有效地增加了少数类内部不同密度簇的连接性,生成了分布更为均衡的数据集.然后使用基于连接的spectral clustering算法进行聚类预测操作,将过采样算法和无监督聚类相结合,提出一种新型实用的软件缺陷预测模型TA-SC(T-ADASYN+spectral clustering).以F-score为评价指标,spectral clustering为聚类模型进行验证.实验结果表明:改进的T-ADASYN过采样算法在公开的PROMISE数据集和NASA数据集上比常用的过采样算法均有6%的性能提升,且TA-SC模型在PROMISE和NASA 2个数据集上比常用聚类算法分别有3%和2%的性能提升. 展开更多
关键词 软件缺陷预测 类别不平衡 过采样算法 聚类算法 无监督学习
下载PDF
面向不平衡类的联邦学习客户端智能选择算法
14
作者 朱素霞 王云梦 +1 位作者 颜培森 孙广路 《哈尔滨理工大学学报》 CAS 北大核心 2024年第2期33-42,共10页
在联邦学习应用场景下,若客户端设备之间的数据呈现非独立同分布特征,甚至出现类不平衡的情况时,客户端本地模型的优化目标将偏离全局优化目标,从而给全局模型的性能带来巨大挑战。为解决这种数据异质性带来的挑战,通过积极选择合适的... 在联邦学习应用场景下,若客户端设备之间的数据呈现非独立同分布特征,甚至出现类不平衡的情况时,客户端本地模型的优化目标将偏离全局优化目标,从而给全局模型的性能带来巨大挑战。为解决这种数据异质性带来的挑战,通过积极选择合适的客户端子集以平衡数据分布将有助于提高模型的性能。因此,设计了一种面向不平衡类的联邦学习客户端智能选择算法—FedSIMT。该算法不借助任何辅助数据集,在保证客户端本地数据对服务器端不可见的隐私前提下,使用Tanimoto系数度量本地数据分布与目标分布之间的差异,采用强化学习领域中的组合多臂老虎机模型平衡客户端设备选择的开发和探索,在不同数据异质性类型下提高了全局模型的准确率和收敛速度。实验结果表明,该算法具有有效性。 展开更多
关键词 联邦学习 类不平衡 客户端选择算法 多臂老虎机
下载PDF
基于领域自适应的变工况轴承故障诊断
15
作者 曹洁 尹浩楠 +1 位作者 雷晓刚 王进花 《北京航空航天大学学报》 EI CAS CSCD 北大核心 2024年第8期2382-2390,共9页
针对轴承故障诊断中存在训练样本和测试样本分布不同及各类故障数据不平衡导致故障识别率低的问题,设计了一种基于改进残差网络(ResNet)的领域自适应故障诊断方法。在诊断网络第1层使用多维度卷积结构进行特征提取,得到不同维度的故障... 针对轴承故障诊断中存在训练样本和测试样本分布不同及各类故障数据不平衡导致故障识别率低的问题,设计了一种基于改进残差网络(ResNet)的领域自适应故障诊断方法。在诊断网络第1层使用多维度卷积结构进行特征提取,得到不同维度的故障特征信息;在领域自适应层采用局部最大平均差异(LMMD)对齐源域和目标域的分布,获取更多细粒度信息;使用类平衡损失函数(CBLoss)解决不平衡数据的训练问题,以Adam优化网络实现故障诊断。实验结果表明,所提方法可在故障样本类别不平衡下有较高的诊断结果。在2个轴承数据集和采集的风力发电机数据上进行实验验证,结果表明,所提方法具有一定的优越性,在数据样本不平衡情况下,诊断性能优于深度神经网络和领域自适应网络等深度迁移学习方法,可作为一种有效的跨工况故障分析方法。 展开更多
关键词 故障诊断 残差网络 数据不平衡 局部最大平均差异 类平衡损失函数 轴承
下载PDF
基于VAE-CWGAN和特征统计重要性融合的网络入侵检测方法 被引量:2
16
作者 刘涛涛 付钰 +1 位作者 王坤 段雪源 《通信学报》 EI CSCD 北大核心 2024年第2期54-67,共14页
针对传统入侵检测方法受限于数据集类不平衡以及所选特征代表性不强等问题,提出一种基于VAE-CWGAN和特征统计重要性融合的检测方法。首先,为提升数据质量对数据集进行预处理;其次,搭建VAE-CWGAN模型生成新样本以解决数据集类不平衡问题... 针对传统入侵检测方法受限于数据集类不平衡以及所选特征代表性不强等问题,提出一种基于VAE-CWGAN和特征统计重要性融合的检测方法。首先,为提升数据质量对数据集进行预处理;其次,搭建VAE-CWGAN模型生成新样本以解决数据集类不平衡问题,使分类模型不再偏向于多数类;再次,使用标准差、中值均值差对特征进行排序,并融合其统计重要性来进行特征选择旨在获得代表性更强的特征,从而使模型更好地学习数据信息;最后,通过一维卷积神经网络对特征选择后的混合数据集进行分类。实验结果表明,所提方法在NSL-KDD、UNSW-NB15和CIC-IDS-2017数据集上都表现出较好的性能优势,准确率分别为98.95%、96.24%和99.92%,有效提升了入侵检测性能。 展开更多
关键词 入侵检测 网络流量 类不平衡 特征选择 统计重要性融合
下载PDF
学习困难与泛化能力感知的软件缺陷预测过采样方法
17
作者 范洪旗 严远亭 +1 位作者 张以文 张燕平 《计算机集成制造系统》 EI CSCD 北大核心 2024年第8期2663-2671,共9页
软件缺陷数据的类别分布不平衡特点给软件缺陷预测任务带了巨大的挑战。合成过采样是解决这一问题最为主流的技术,但如何设计合适的采样策略避免因引入异常样本而导致的过度泛化风险,始终是软件缺陷预测过采样方法面临的难点。针对这一... 软件缺陷数据的类别分布不平衡特点给软件缺陷预测任务带了巨大的挑战。合成过采样是解决这一问题最为主流的技术,但如何设计合适的采样策略避免因引入异常样本而导致的过度泛化风险,始终是软件缺陷预测过采样方法面临的难点。针对这一问题,本文提出一种结合样本学习困难程度和合成泛化影响的过采样方法(GDOS)。具体来说,GDOS方法通过样本的局部先验概率和潜在合成方向上的样本分布信息衡量样本的安全系数与泛化系数,并以此度量样本的选择权重。通过抑制潜在过泛化区域的样本合成概率,给予相对安全的近邻合成方向更高的选择概率,为高质量样本的合成提供保障。在26个PROMISE数据集上的实验表明,GDOS在MCC、pd、pf、F-measure等指标上较于经典的采样方法和专门提出的软件缺陷预测采样方法均取得了更优的性能表现。 展开更多
关键词 软件缺陷预测 类别不平衡 过采样 过度泛化
下载PDF
基于多模型组合的类别不平衡海洋数据质量控制方法
18
作者 宋巍 张贵庆 +3 位作者 谢京容 董明媚 岳心阳 杨扬 《海洋预报》 CSCD 北大核心 2024年第3期61-70,共10页
提出一种多模型组合的两层海洋数据质量控制框架,选择了多种常见分类算法作为基学习器对数据质量标签进行初级预测,再经过投票法或堆叠(Stacking)法确定海洋数据质量的标识符;针对类别不平衡问题,结合自适应下采样策略,降低数据的不平... 提出一种多模型组合的两层海洋数据质量控制框架,选择了多种常见分类算法作为基学习器对数据质量标签进行初级预测,再经过投票法或堆叠(Stacking)法确定海洋数据质量的标识符;针对类别不平衡问题,结合自适应下采样策略,降低数据的不平衡比率,并结合Focal Loss损失函数,提升模型对难分类样本的识别能力。以来源于国际综合海洋大气数据集的海表温度和气温数据为例进行质量控制验证,结果表明:投票法或堆叠法对极少类的错误样本分类的F1 score(精确率和召回率的加权调和平均值)在海表温度数据上可达到0.980 6和0.981 2,在气温数据上可达到0.998 5和0.998 3。 展开更多
关键词 质量控制 海洋气象数据 集成学习 类别不平衡
下载PDF
基于前景理论的软件缺陷预测过采样方法
19
作者 徐彪 严远亭 张以文 《计算机集成制造系统》 EI CSCD 北大核心 2024年第8期2822-2831,共10页
在软件缺陷预测中,数据困难因子对预测性能的影响比类不平衡更为明显。然而,大多数现有软件缺陷预测过采样方法在解决类不平衡问题过程中,忽视了软件项目数据集固有的数据困难因子,从而导致预测性能不佳。针对上述问题,提出一种基于前... 在软件缺陷预测中,数据困难因子对预测性能的影响比类不平衡更为明显。然而,大多数现有软件缺陷预测过采样方法在解决类不平衡问题过程中,忽视了软件项目数据集固有的数据困难因子,从而导致预测性能不佳。针对上述问题,提出一种基于前景理论的过采样算法(POS)。POS同时考虑局部邻域中同类和异类样本的影响来评估少数类样本的学习难度,通过基于引力的策略构建同类收益和异类损失来刻画样本的前景值,并强调异类损失来计算少数类样本的采样权重,以此降低引入数据困难因子的风险,提高合成样本的质量,进一步提升预测性能。在NASA数据集上的实验结果表明,POS算法在AUC、balance和G-mean等性能指标上均有所提升,具有更好的缺陷预测性能。 展开更多
关键词 软件缺陷预测 类不平衡 数据困难因子 过采样 前景理论
下载PDF
基于语义分割的侧扫声纳管线目标检测方法
20
作者 郑根 徐会希 +1 位作者 赵建虎 杨文林 《海洋测绘》 CSCD 北大核心 2024年第2期9-13,共5页
为提高侧扫声纳图像中管线目标检测的自动化程度及效率,提出了一种基于语义分割的水下管线目标检测方法。首先通过构建高效语义分割网络主干,提高网络计算速度并降低网络对计算机硬件性能的需求;其次给出了一种针对管线目标特点的加权... 为提高侧扫声纳图像中管线目标检测的自动化程度及效率,提出了一种基于语义分割的水下管线目标检测方法。首先通过构建高效语义分割网络主干,提高网络计算速度并降低网络对计算机硬件性能的需求;其次给出了一种针对管线目标特点的加权交叉熵损失函数,解决了因类间数量不均衡导致的网络训练困难问题。以多种复杂条件下侧扫声纳实测数据进行了水下管线检测试验,结果表明,该方法在取得和经典网络相近精度的情况下,速度提升了2.7倍,可达52.6FPS,实现了水下管线的快速、准确检测。 展开更多
关键词 水下目标检测 侧扫声纳图像 深度学习 语义分割 网络优化 类间不平衡
下载PDF
上一页 1 2 17 下一页 到第
使用帮助 返回顶部