摘要
现实中的数据集普遍具有非均衡性。针对不平衡分类问题,建立数据集网络结构来充分挖掘隐藏在样本点位置信息外的拓扑特征,分析网络节点的连接特性并赋予节点不同的效率。计算待测节点与每个子网络的相似性测度,依据新型的概率模型,进一步推算出该节点与各子网络的整体性测度。构建了一种基于网络拓扑特征的不平衡数据分类方法,算法中引入不平衡因子c用以减小由正负类样本数量差异所带来的影响。实验结果表明,该算法能有效提高分类精度,特别是对拓扑特征明显的数据集,在分类性能和适应能力上相比传统分类方法都得到进一步提升。
This paper aims to solve the imbalanced data classification problem,which has been proven to be common in real applications.The dataset network structure is established to fully mine the topological features hidden outside the position information of sample points,analyze the connection characteristics of network nodes,and give these nodes dif-ferent efficiencies.The similarity measure between the node to be tested and each sub-network is calculated,and the in-tegrity measure between the node and each sub-network is further calculated according to the new probability model.A classification method of imbalanced data based on network topology features is constructed.An imbalanced factor c is introduced into the algorithm to reduce the influence caused by the difference in the number of positive and negative samples.The experimental results show that the algorithm can effectively improve the classification accuracy,espe-cially for datasets with significant topological features.The classification performance and adaptability are further im-proved compared with the traditional classification method.
作者
普事业
刘三阳
白艺光
PU Shiye;LIU Sanyang;BAI Yiguang(School of Mathematics and Statistics,Xidian University,Xi’an 710126,China)
出处
《智能系统学报》
CSCD
北大核心
2019年第5期889-896,共8页
CAAI Transactions on Intelligent Systems
基金
国家自然科学基金项目(61877046)
陕西省自然科学基金项目(2017JM1001)
关键词
不平衡数据
相似度
网络结构
准确率
拓扑
物理特征
imbalanced data
similarity
network structure
accuracy rate
topology
physical feature