期刊文献+

检测多元相关关系的最大信息熵方法 被引量:4

Detecting Multivariable Correlation with Maximal Information Entropy
下载PDF
导出
摘要 目前提出的用于检测变量间相关关系的方法,如最大信息系数(Maximal Information Coefficient,MIC),多应用于成对变量,却很少用于三元变量或更高元变量间的相关性检测。基于此,该文提出能够检测多元变量间相关关系的新方法最大信息熵(Maximal Information Entropy,MIE)。对于k元变量,首先基于任意两变量间的MIC值构造最大信息矩阵,然后根据最大信息矩阵计算最大信息熵来度量变量间的相关度。仿真实验结果表明MIE能够检测三元变量间的1维流形依赖关系,真实数据集上的实验验证了MIE的实用性。 Many measures, e.g., Maximal Information Coefficient(MIC), are presented to identify interesting correlations for pairs of variables, but few for triplets or even for higher dimension variable set. Based on that, the Maximal Information Entropy(MIE) is proposed for measuring the general correlation of a multivariable data set. For k variables, firstly, the maximal information matrix is constructed according to the MIC scores of any pairs of variables; then, maximal information entropy, which measures the correlation degree of the concerned k variables, is calculated based on the maximal information matrix. The simulation experimental results show that MIE can detect one-dimensional manifold dependence of triplets. The applications to real datasets further verify the feasibility of MIE.
出处 《电子与信息学报》 EI CSCD 北大核心 2015年第1期123-129,共7页 Journal of Electronics & Information Technology
基金 国家自然科学基金(61175004) 北京市自然科学基金(4112009) 北京市教委科技发展重点项目(KZ01210005007) 高等学校博士学科点专项科研基金(20121103110029) 北京工业大学第12届研究生科技基金(ykj-2013-9492)资助课题
关键词 数据挖掘 多元相关 最大信息系数 最大信息熵 Data mining Multivariable correlation Maximal Information Coefficient(MIC) Maximal Information Entropy(MIE)
  • 相关文献

参考文献17

  • 1Szkely G J, Rizzo M L, and Bakirov N K. Measuring and testing independence by correlation of distances[J]. The Annals of Statistics, 2007, 35(6): 2769-2794.
  • 2Szkely G J, Pdzzo M L, and Bakirov N K. Brownian distance covariance[J]. The Annals of Applied Statistics, 2009, 3(4): 1236-1265.
  • 3Venelli A. Efficient Entropy Estimation for Mutual Information Analysis Using B-splines[M]. Heidelberg, Berlin, Germany, Springer Berlin Heidelberg, 2010: 17-30.
  • 4Silva J and Narayanan S S. On data-driven histogram-based estimation for mutual information[C]. IEEE International Symposium on Information Theory Proceedings, Austin, Texas, USA, 2010: 1423-1427.
  • 5韩敏,梁志平.改进型平均移位柱状图估算概率密度并对互信息作相关分析[J].控制理论与应用,2011,28(6):845-850. 被引量:6
  • 6Reshef D N, Reshef Y A, Finucane H K, et al.. Detecting novel associations in large data sets[J]. Science, 2011, 334(6062): 1518-1524.
  • 7Speed T. A correlation for the 21st century[J]. Science, 2011, 334(6062): 1502-1503.
  • 8Das J, Mohammed J, and Yu H, Genome-scale analysis of interaction dynamics reveals organization of biological networks[J]. Bioinformatics, 2012, 28(14): 1873-1878.
  • 9Pang C N I, Goel A, Li S S, et al.. A multi-dimensional matrix for systems biology research and its application to interaction networks[J]. Journal of Proteome Research, 2012, 11(11): 5204-522{).
  • 10Koren O, Goodrich J K, Cullender T C, et al.. Host remodeling of the gut microbiome and metabolic changes during pregnancy[J]. Cell, 2012, 150(3): 470-480.

二级参考文献13

  • 1张佃中.非线性时间序列互信息与Lempel-Ziv复杂度的相关性研究[J].物理学报,2007,56(6):3152-3157. 被引量:24
  • 2RADAN H, LUCIE E Simultaneous analysis of climatic trends in multiple variables: an example of application of multivariate statis- tical methods[J]. International Journal of Climatology, 2005, 25(4): 469 - 484.
  • 3HIROYUKI Y, HIDEKE Y, EIICHIRO E et al. Canonical correlation analysis for multivariate regression and its application to metabolic fingerprinting[J]. BiochemicalEngineeringJournal, 2008, 40(2): 199 - 204.
  • 4LOPEZ J M, BORRAJO J L, GARCIA E D M, et al. Multivariate analysis of contamination in the mining district of Linares[J]. Applied Geochemistry, 2008, 23(8): 2324 - 2336.
  • 5CHEN Y H, RANGARNJAN G, FENG J E et al. Analyzing multi- ple nonlinear time series with extended Granger causality[J]. Physics LettersA, 2004, 324(1): 26 - 35.
  • 6SCOTT D W. Averaged shifted histograms: effective nonparametric estimators in several dimensions[J]. The Annals of Statistics, 1985, 13(3): 1024- 1040.
  • 7FERNADO T M K G, MAIER H R, DANDY G C. Selection of input variables for data driven models: an average shifted histogram parial mutual information estimator approach[J]. Journal of Hydrology, 2009, 367(3/4): 165 - 176.
  • 8SCOTT D W, TERRELL G R. Biased and unbiased cross-validation in density estimation[J]. Journal of the American Statistical Association, 1987, 82(400): 1131 - 1146.
  • 9CELLUCCI C J, ALBANO A M, RAPP P E. Statistical validation of mutual information calculations comparison of alternative numerical algorithms[J]. Physical Review E, 2005, 71(6): 066208.
  • 10FRANCOISA D, ROSSIB E WERTZA V, et al. Resampling methods for parameter-free and robust feature selection with mutual information[J]. Neurocomputing, 2007, 70(7/9): 1276 - 1288.

共引文献5

同被引文献49

引证文献4

二级引证文献50

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部