动态部分标记混合数据的增量式特征选择算法被引量：2

Incremental Feature Selection Algorithm for Dynamic Partially Labeled Hybrid Data

下载PDF

导出

摘要许多实际应用中的数据集是由符号型、数值型和缺失型特征构成的混合数据。针对混合数据的决策标记,由于获取全部数据的决策标记需要耗费大量的人工和时间成本,只能为部分数据进行决策标记,因此产生了部分标记数据。同时,现实应用领域中数据是动态产生的,即数据维度随着不同的需求动态地增加或删减。针对混合数据的高维性、部分标记和动态性,文中提出了两种面向部分标记混合数据的增量式特征选择算法。首先,利用信息粒度对部分标记混合数据的特征进行重要度分析;其次,当特征集发生动态变化时,结合增量学习的思想,给出信息粒度的增量更新机制;然后,在此基础上提出了两种面向部分标记混合数据的增量式特征选择算法;最后,通过与其他算法在UCI数据集上的实验结果进行对比,进一步验证了所提算法的可行性和有效性。 Many real-world data sets are hybrid data consisting of symbolic,numerical and missing features.For the decision labels of hybrid data,it costs much labor and it is expensive to acquire the decision labels of all data,thus the partially labeled data is generated.Meanwhile,the data in real-world applications change dynamically,i.e.,the feature set is added into and deleted from the feature sets dynamically with different requirements.In this paper,according to the characteristics of high-dimensional,partial labeled and dynamic for the hybrid data,the incremental feature selection algorithms are proposed.Firstly,the information granularity is used to analyze the feature significance for partially labeled hybrid data.Then,the incremental updating mechanisms for information granularity are proposed with the variation of a feature set.On this basis,the incremental feature selection algorithms are proposed for the partially labeled hybrid data.Finally,extensive experimental results on UCI data set demonstrate that the proposed algorithms are feasible and efficient.

作者闫振超舒文豪谢昕 YAN Zhen-chao;SHU Wen-hao;XIE Xin(School of Information Engineering,East China Jiaotong University,Nanchang 330013,China)

机构地区华东交通大学信息工程学院

出处《计算机科学》 CSCD 北大核心 2022年第11期98-108,共11页 Computer Science

基金国家自然科学基金(61662023,61762037) 江西省自然科学基金(20202BABL202037)。

关键词混合数据部分标记增量学习信息粒度特征选择 Hybrid data Partially labeled Incremental learning Information granularity Feature selection

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献5

1郑娜,王加阳.不完备序信息系统的证据特征及属性约简[J].计算机工程与应用,2018,54(21):43-47. 被引量：19
2万源,陈晓丽,张景会,欧卓玲.低秩稀疏图嵌入的半监督特征选择[J].中国图象图形学报,2018,23(9):1316-1325. 被引量：5
3王锋,刘吉超,魏巍.基于信息熵的半监督特征选择算法[J].计算机科学,2018,45(B11):427-430. 被引量：14
4肖丽莎,王红军,杨燕.基于属性依赖的混合约束半监督特征选择[J].计算机应用,2015,35(A02):80-84. 被引量：2
5刘艺,曹建军,刁兴春,周星.特征选择稳定性研究综述[J].软件学报,2018,29(9):2559-2579. 被引量：37

二级参考文献25

1黄兵,周献中,张蓉蓉.基于信息量的不完备信息系统属性约简[J].系统工程理论与实践,2005,25(4):55-60. 被引量：41
2杨善林,刘业政,李亚飞.基于Rough Sets理论的证据获取与合成方法[J].管理科学学报,2005,8(5):69-75. 被引量：12
3徐章艳,刘作鹏,杨炳儒,宋威.一个复杂度为max（O（｜C｜｜U｜），O（｜C^2｜U／C｜））的快速属性约简算法[J].计算机学报,2006,29(3):391-399. 被引量：234
4杨明.一种基于改进差别矩阵的核增量式更新算法[J].计算机学报,2006,29(3):407-413. 被引量：76
5GAO K, KHOSHGOFTAAR T, NAPOLITANO A. Improving soft- ware quality estimation by combining boosting and feature selection [ C]// Proceedings of the 12th International Conference on Machine Learning and Applications. Piscataway: IEEE, 2013, 1 : 27 - 33.
6SHI C, LIU L, YAN X. Web image annotation with semi-supervised feature selection [ C]// Proceedings of the 5th IET International Conference on Wireless, Mobile and Multimedia Networks. Piscat- away: IEEE, 2013:225 -228.
7AWADA W, KHOSHGOFTAAR T M, DI3TMAN D, et al. A review of the stability of feature selection techniques for bioinformatics data [ C]// Proceedings of the 13th International Conference on Information Reuse and Integration. Piscataway: IEEE, 2012:356-363.
8YANG F, MAO K Z. Robust feature selection for mieroarray data based on multicriterion fusion [ J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2011, 8(4): 1080-1092.
9MAULIK U, CHAKRABORTY D. Fuzzy preference based feature selection and semisupervised SVM for cancer classification [ J ]. NanoBioscience, 2014, 13(2) : 152 - 160.
10SUZUKI A, RYU K. Feature selection method for estimating systolic blood pressure using the Taguchi method[ J]. IEEE Transactions on Industrial Informatics, 2014, 10(2) : 1077 - 1085.