摘要
许多实际应用中的数据集是由符号型、数值型和缺失型特征构成的混合数据。针对混合数据的决策标记,由于获取全部数据的决策标记需要耗费大量的人工和时间成本,只能为部分数据进行决策标记,因此产生了部分标记数据。同时,现实应用领域中数据是动态产生的,即数据维度随着不同的需求动态地增加或删减。针对混合数据的高维性、部分标记和动态性,文中提出了两种面向部分标记混合数据的增量式特征选择算法。首先,利用信息粒度对部分标记混合数据的特征进行重要度分析;其次,当特征集发生动态变化时,结合增量学习的思想,给出信息粒度的增量更新机制;然后,在此基础上提出了两种面向部分标记混合数据的增量式特征选择算法;最后,通过与其他算法在UCI数据集上的实验结果进行对比,进一步验证了所提算法的可行性和有效性。
Many real-world data sets are hybrid data consisting of symbolic,numerical and missing features.For the decision labels of hybrid data,it costs much labor and it is expensive to acquire the decision labels of all data,thus the partially labeled data is generated.Meanwhile,the data in real-world applications change dynamically,i.e.,the feature set is added into and deleted from the feature sets dynamically with different requirements.In this paper,according to the characteristics of high-dimensional,partial labeled and dynamic for the hybrid data,the incremental feature selection algorithms are proposed.Firstly,the information granularity is used to analyze the feature significance for partially labeled hybrid data.Then,the incremental updating mechanisms for information granularity are proposed with the variation of a feature set.On this basis,the incremental feature selection algorithms are proposed for the partially labeled hybrid data.Finally,extensive experimental results on UCI data set demonstrate that the proposed algorithms are feasible and efficient.
作者
闫振超
舒文豪
谢昕
YAN Zhen-chao;SHU Wen-hao;XIE Xin(School of Information Engineering,East China Jiaotong University,Nanchang 330013,China)
出处
《计算机科学》
CSCD
北大核心
2022年第11期98-108,共11页
Computer Science
基金
国家自然科学基金(61662023,61762037)
江西省自然科学基金(20202BABL202037)。
关键词
混合数据
部分标记
增量学习
信息粒度
特征选择
Hybrid data
Partially labeled
Incremental learning
Information granularity
Feature selection