摘要
特征选择是解决数据高维性的一种有效方法,传统的特征选择算法常用经典信息论知识去度量特征的重要度,却忽略了标记和未标记数据的互相影响;同时,这些方法主要基于静态数据的多标记特征选择,很难直接应用到动态流数据环境中.而现实世界中,由于动态环境之下特征到达的数目和顺序都是未知的,并且研究者往往可能只对最近到达的特征感兴趣,所以滑动窗口机制能很好地解决此类问题.基于此,首先引入一种具有补性质的模糊信息熵,并考虑标记和未标记数据的互相影响,提出一种加权的模糊互信息度量方法,然后结合滑动窗口机制,分别提出基于固定滑动窗口的加权模糊互信息特征选择(Feature Selection with Weighted Fuzzy Mutual Information based on Sliding Window,FS-FMI)和基于动态滑动窗口的加权模糊互信息流特征选择(Streaming Feature Selection with Weighted Fuzzy Mutual Information based on Dynamic Sliding Window,SFS-FMI-DSW)两种算法.实验结果表明,SFS-FMI-DSW算法更加有效,统计假设进一步说明了算法的有效性.
Feature selection is an effective method to solve the high dimensionality of data.Classical information theory is often used to measure the importance of features but the influence between labeled and unlabeled data is ignored in traditional feature selection algorithms.Meanwhile,those methods are used for static data,and are difficult to apply to streaming data.In real world,the number or the sequence of the arrival of features under the dynamic environment is unknown.And researchers are often only interested in the recently arrived fentures.The problem can be well solved by sliding window mechanism.Based on it,in this article,a kind of fuzzy information entropy with complementary properties is introduced.Furthermore,due to the influence of labeled and unlabeled data,a weightedfuzzy mutual information metric method is proposed.The novel algorithms are proposed combining with sliding window mechanism:Feature Selection with Weighted Fuzzy Mutual Information based on Sliding Window(FS-FMI)and Streaming Feature Selection with Weighted Fuzzy Mutual Information based on Dynamic Sliding Window(SFS-FMI-DSW).Statistical hypothesis illustrates the effectiveness of our algorithms,and the experimental results show that SFS-FMI-DSW is more effective.
作者
程玉胜
李雨
王一宾
陈飞
Cheng Yusheng;Li Yu;Wang Yibin;Chen Fei(School of Computer and Information,Anqing Normal University,Anqing,246011,China;The University Key Laboratory of Intelligent Perception and Computing of Anhui Province,Anqing,246011,China)
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2018年第5期974-985,共12页
Journal of Nanjing University(Natural Science)
基金
安徽省高校重点科研项目(KJ2017A352)
数据科学与智能应用福建省高校重点实验室开放课题(D1801)
安徽省高校重点实验室基金(ACAIM160102)
关键词
特征选择
滑动窗口
流数据
多标记
模糊互信息
feature selection
sliding window
streaming data
multi-label
fuzzy mutual information