期刊文献+

网络拓扑特征的不平衡数据分类 被引量:4

Imbalanced data classification of network topology characteristics
下载PDF
导出
摘要 现实中的数据集普遍具有非均衡性。针对不平衡分类问题,建立数据集网络结构来充分挖掘隐藏在样本点位置信息外的拓扑特征,分析网络节点的连接特性并赋予节点不同的效率。计算待测节点与每个子网络的相似性测度,依据新型的概率模型,进一步推算出该节点与各子网络的整体性测度。构建了一种基于网络拓扑特征的不平衡数据分类方法,算法中引入不平衡因子c用以减小由正负类样本数量差异所带来的影响。实验结果表明,该算法能有效提高分类精度,特别是对拓扑特征明显的数据集,在分类性能和适应能力上相比传统分类方法都得到进一步提升。 This paper aims to solve the imbalanced data classification problem,which has been proven to be common in real applications.The dataset network structure is established to fully mine the topological features hidden outside the position information of sample points,analyze the connection characteristics of network nodes,and give these nodes dif-ferent efficiencies.The similarity measure between the node to be tested and each sub-network is calculated,and the in-tegrity measure between the node and each sub-network is further calculated according to the new probability model.A classification method of imbalanced data based on network topology features is constructed.An imbalanced factor c is introduced into the algorithm to reduce the influence caused by the difference in the number of positive and negative samples.The experimental results show that the algorithm can effectively improve the classification accuracy,espe-cially for datasets with significant topological features.The classification performance and adaptability are further im-proved compared with the traditional classification method.
作者 普事业 刘三阳 白艺光 PU Shiye;LIU Sanyang;BAI Yiguang(School of Mathematics and Statistics,Xidian University,Xi’an 710126,China)
出处 《智能系统学报》 CSCD 北大核心 2019年第5期889-896,共8页 CAAI Transactions on Intelligent Systems
基金 国家自然科学基金项目(61877046) 陕西省自然科学基金项目(2017JM1001)
关键词 不平衡数据 相似度 网络结构 准确率 拓扑 物理特征 imbalanced data similarity network structure accuracy rate topology physical feature
  • 相关文献

参考文献4

二级参考文献41

  • 1陈振洲,李磊,姚正安.基于SVM的特征加权KNN算法[J].中山大学学报(自然科学版),2005,44(1):17-20. 被引量:51
  • 2He Haibo, Edwardo A. Learning from Imbalanced Data[J]. IEEE Trans. on Knowledge and Data Engineering, 2009, 21(9): 1263- 1284.
  • 3Chawla N V, Japkowicz N, Kolcz A. Editorial: Special Issue on Learning from Imbalanced Data Sets[J]. SIGKDD Explorations,2004, 6(1): 1-6.
  • 4Batista G E A, Prati R C, Monard M C. A Study of the Behavior of Several Methods for Balancing Machine Learning TrainingData[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 20-29.
  • 5Fawcett T. An Introduction to ROC Analysis[J]. Pattern Recognition Letters, 2006, 27(8): 861-874.
  • 6Tan P N, Steinbach M, Kumar V. Introduction to Data Mining[M]. Boston, Massachusetts, USA: Addison Wesley, 2005.
  • 7He Haibo, Garcia E A. Learning from Imbalanced Data[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263-1284.
  • 8Friedman J H, Bogdan E P. Predictive Learning via Rule Ensemble[J]. Annals of Applied Statistics, 2008, 2(3): 916-954.
  • 9Tan P, Steinbach M, Kumar V. 数据挖掘导论[M]. 范 明, 范宏建, 译. 北京: 人民邮电出版社, 2008.
  • 10Partalas I, Tsoumakas G, Vlahavas I P. An Ensemble Pruning Primer[C]//Proc. of Workshop on Applications of Supervised and Unsupervised Ensemble Methods. Berlin, Germany: Springer-Verlag, 2009: 1-13.

共引文献51

同被引文献36

引证文献4

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部