期刊文献+

动态滑动窗口加权互信息流特征选择 被引量:7

Streaming feature selection with weighted fuzzy mutual information based on dynamic sliding window
下载PDF
导出
摘要 特征选择是解决数据高维性的一种有效方法,传统的特征选择算法常用经典信息论知识去度量特征的重要度,却忽略了标记和未标记数据的互相影响;同时,这些方法主要基于静态数据的多标记特征选择,很难直接应用到动态流数据环境中.而现实世界中,由于动态环境之下特征到达的数目和顺序都是未知的,并且研究者往往可能只对最近到达的特征感兴趣,所以滑动窗口机制能很好地解决此类问题.基于此,首先引入一种具有补性质的模糊信息熵,并考虑标记和未标记数据的互相影响,提出一种加权的模糊互信息度量方法,然后结合滑动窗口机制,分别提出基于固定滑动窗口的加权模糊互信息特征选择(Feature Selection with Weighted Fuzzy Mutual Information based on Sliding Window,FS-FMI)和基于动态滑动窗口的加权模糊互信息流特征选择(Streaming Feature Selection with Weighted Fuzzy Mutual Information based on Dynamic Sliding Window,SFS-FMI-DSW)两种算法.实验结果表明,SFS-FMI-DSW算法更加有效,统计假设进一步说明了算法的有效性. Feature selection is an effective method to solve the high dimensionality of data.Classical information theory is often used to measure the importance of features but the influence between labeled and unlabeled data is ignored in traditional feature selection algorithms.Meanwhile,those methods are used for static data,and are difficult to apply to streaming data.In real world,the number or the sequence of the arrival of features under the dynamic environment is unknown.And researchers are often only interested in the recently arrived fentures.The problem can be well solved by sliding window mechanism.Based on it,in this article,a kind of fuzzy information entropy with complementary properties is introduced.Furthermore,due to the influence of labeled and unlabeled data,a weightedfuzzy mutual information metric method is proposed.The novel algorithms are proposed combining with sliding window mechanism:Feature Selection with Weighted Fuzzy Mutual Information based on Sliding Window(FS-FMI)and Streaming Feature Selection with Weighted Fuzzy Mutual Information based on Dynamic Sliding Window(SFS-FMI-DSW).Statistical hypothesis illustrates the effectiveness of our algorithms,and the experimental results show that SFS-FMI-DSW is more effective.
作者 程玉胜 李雨 王一宾 陈飞 Cheng Yusheng;Li Yu;Wang Yibin;Chen Fei(School of Computer and Information,Anqing Normal University,Anqing,246011,China;The University Key Laboratory of Intelligent Perception and Computing of Anhui Province,Anqing,246011,China)
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2018年第5期974-985,共12页 Journal of Nanjing University(Natural Science)
基金 安徽省高校重点科研项目(KJ2017A352) 数据科学与智能应用福建省高校重点实验室开放课题(D1801) 安徽省高校重点实验室基金(ACAIM160102)
关键词 特征选择 滑动窗口 流数据 多标记 模糊互信息 feature selection sliding window streaming data multi-label fuzzy mutual information
  • 相关文献

参考文献3

二级参考文献32

  • 1LIANG Ji-ye, QU Kai-she Department of Computer Science, Shanxi University, Taiyuan 030006, China.Information Measures of Roughness of Knowledge and Rough Sets for Incomplete Information Systems[J].Journal of Systems Science and Systems Engineering,2001,13(4):418-424. 被引量:9
  • 2Aggarwal CC,Han J,Wang J,Yu PS.A framework for clustering evolving data streams.In:Freytag JC,Lockemann PC,Abiteboul S,Carey MJ,Selinger PG,Heuer A,eds.Proc.of the Int'l Conf.on Very Large Data Bases.Berlin:Morgan Kaufmann Publishers,2003.81-92
  • 3Chalaghan LO,Mishra N,Meyerson A,Guha S.Streaming data algorithms for high-quality clustering.In:Proc.of the 18th Int'l Conf.on Data Engineering.San Jose,2002.685-694.http://doi.ieeecomputersociety.org/10.1109/ICDE.2002.994785
  • 4Domingos P,Hulten C.Mining high-speed data streams.In:Proc.of the KDD.2000.http://citeseer.ist.psu.edu/domingos00mining.html
  • 5Guha S,Meyerson A,Mishra N,Motwani R,Callaghan LO.Clustering data streams:Theory and practice.IEEE Trans.on Knowledge and Data Engineering,2003,3(15):515-528.
  • 6Guha S,Mishra N,Motwani R,Callaghan LO.Clustering data stream.In:Proc.of the 41st Annual Symp.on Foundations of Computer Science.Redondo Beach:IEEE Computer Society,2000.359-366.
  • 7Nam H,Won S.Statistical grid-based clustering over data streams.SIGMOD Record,2004,33(1):32-37.
  • 8Ordonez C.Clustering binary data streams with k-means.In:Zaki MJ,Aggarwal CC,eds.Proc.of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD).San Diego,2003.12-19.
  • 9Zhou A,Cai Z,Wei L,Qian W.M-Kernel merging:Towards density estimation over data streams.In:Proc.of the 8th Int'l Conf.on Database Systems for Advanced Applications (DASFAA).Kyoto,2003.285-292.
  • 10Aggarwal CC,Han J,Wang J,Yu PS.A framework for projected clustering of high dimensional data streams.In:Nascimento MA,Ozsu MT,Kossmann D,Miller RJ,Blakeley JA,Schiefer KB,eds.Proc.of the VLDB.Toronto:Morgan Kaufmann Publishers,2004.852-863.

共引文献98

同被引文献32

引证文献7

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部