期刊文献+

动态部分标记混合数据的增量式特征选择算法 被引量:2

Incremental Feature Selection Algorithm for Dynamic Partially Labeled Hybrid Data
下载PDF
导出
摘要 许多实际应用中的数据集是由符号型、数值型和缺失型特征构成的混合数据。针对混合数据的决策标记,由于获取全部数据的决策标记需要耗费大量的人工和时间成本,只能为部分数据进行决策标记,因此产生了部分标记数据。同时,现实应用领域中数据是动态产生的,即数据维度随着不同的需求动态地增加或删减。针对混合数据的高维性、部分标记和动态性,文中提出了两种面向部分标记混合数据的增量式特征选择算法。首先,利用信息粒度对部分标记混合数据的特征进行重要度分析;其次,当特征集发生动态变化时,结合增量学习的思想,给出信息粒度的增量更新机制;然后,在此基础上提出了两种面向部分标记混合数据的增量式特征选择算法;最后,通过与其他算法在UCI数据集上的实验结果进行对比,进一步验证了所提算法的可行性和有效性。 Many real-world data sets are hybrid data consisting of symbolic,numerical and missing features.For the decision labels of hybrid data,it costs much labor and it is expensive to acquire the decision labels of all data,thus the partially labeled data is generated.Meanwhile,the data in real-world applications change dynamically,i.e.,the feature set is added into and deleted from the feature sets dynamically with different requirements.In this paper,according to the characteristics of high-dimensional,partial labeled and dynamic for the hybrid data,the incremental feature selection algorithms are proposed.Firstly,the information granularity is used to analyze the feature significance for partially labeled hybrid data.Then,the incremental updating mechanisms for information granularity are proposed with the variation of a feature set.On this basis,the incremental feature selection algorithms are proposed for the partially labeled hybrid data.Finally,extensive experimental results on UCI data set demonstrate that the proposed algorithms are feasible and efficient.
作者 闫振超 舒文豪 谢昕 YAN Zhen-chao;SHU Wen-hao;XIE Xin(School of Information Engineering,East China Jiaotong University,Nanchang 330013,China)
出处 《计算机科学》 CSCD 北大核心 2022年第11期98-108,共11页 Computer Science
基金 国家自然科学基金(61662023,61762037) 江西省自然科学基金(20202BABL202037)。
关键词 混合数据 部分标记 增量学习 信息粒度 特征选择 Hybrid data Partially labeled Incremental learning Information granularity Feature selection
  • 相关文献

参考文献5

二级参考文献25

  • 1黄兵,周献中,张蓉蓉.基于信息量的不完备信息系统属性约简[J].系统工程理论与实践,2005,25(4):55-60. 被引量:41
  • 2杨善林,刘业政,李亚飞.基于Rough Sets理论的证据获取与合成方法[J].管理科学学报,2005,8(5):69-75. 被引量:12
  • 3徐章艳,刘作鹏,杨炳儒,宋威.一个复杂度为max(O(|C||U|),O(|C^2|U/C|))的快速属性约简算法[J].计算机学报,2006,29(3):391-399. 被引量:234
  • 4杨明.一种基于改进差别矩阵的核增量式更新算法[J].计算机学报,2006,29(3):407-413. 被引量:76
  • 5GAO K, KHOSHGOFTAAR T, NAPOLITANO A. Improving soft- ware quality estimation by combining boosting and feature selection [ C]// Proceedings of the 12th International Conference on Machine Learning and Applications. Piscataway: IEEE, 2013, 1 : 27 - 33.
  • 6SHI C, LIU L, YAN X. Web image annotation with semi-supervised feature selection [ C]// Proceedings of the 5th IET International Conference on Wireless, Mobile and Multimedia Networks. Piscat- away: IEEE, 2013:225 -228.
  • 7AWADA W, KHOSHGOFTAAR T M, DI3TMAN D, et al. A review of the stability of feature selection techniques for bioinformatics data [ C]// Proceedings of the 13th International Conference on Information Reuse and Integration. Piscataway: IEEE, 2012:356-363.
  • 8YANG F, MAO K Z. Robust feature selection for mieroarray data based on multicriterion fusion [ J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2011, 8(4): 1080-1092.
  • 9MAULIK U, CHAKRABORTY D. Fuzzy preference based feature selection and semisupervised SVM for cancer classification [ J ]. NanoBioscience, 2014, 13(2) : 152 - 160.
  • 10SUZUKI A, RYU K. Feature selection method for estimating systolic blood pressure using the Taguchi method[ J]. IEEE Transactions on Industrial Informatics, 2014, 10(2) : 1077 - 1085.

共引文献72

同被引文献12

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部