期刊文献+

多重假设检验及其在大数据特征降维中的应用 被引量:3

Multiple Hypothesis Testing and its Application in Feature Dimension Reduction
下载PDF
导出
摘要 现有的特征降维方法大致可分为特征提取和特征选择。在特征提取过程中,数据中的原始特征通过某些数据变换被映射到一个低维空间。提取出的特征尽管与原始特征相关,但不再具有原始特征的物理意义,即特征提取改变了原始数据的表达形式。与特征提取不同,特征选择则在原有的特征集中选择一个子集,选择出的特征子集中不再含有与数据分析任务相关性不大或冗余的那部分特征,其结果可能引起信息丢失。因而现有的数据降维方法几乎都不是保真降维,其降维后的数据仅适合特定的后续数据分析任务,因而只能算是特定数据分析任务的前期数据预处理。从多重假设检验方法的角度分析了高维数据保真降维的方法及研究的关键所在。 The existing feature dimension reduction methods can roughly be categorized into two classes:feature extraction and feature selection.In feature extraction problems,the original features in the measurement space are initially transformed into a new dimension-reduced space via some specified transformation.Although the significant variables determined in the new space are related to the original variables,the physical interpretation in terms of the original variables may be lost.So,feature extraction will change the description of the original data.Unlike feature extraction,feature selection aims to seek optimal or suboptimal subsets of the original features by preserving the main information carried by the complete data to facilitate future analysis for high dimensional problems.Often,the selected features are a subset of the original features,and those insignificant and redundant features may be discarded.It is worth mentioning that almost all of the existing dimensionality reduction methods are not high fidelity methods.The result of these methods is only suitable for specific subsequent data analysis tasks,which is only aparticular task under the preprocess.In this paper,with the technique of multiple hypothesis testing,we studied the dimensionality high fidelity reduction problem.The processing results can save all the useful information and eliminate the irrelevant features from the original data.
作者 潘舒 祁云嵩
出处 《计算机科学》 CSCD 北大核心 2015年第S1期89-93,共5页 Computer Science
基金 国家自然科学基金(61471182) 江苏省高校自然科学基金(13KlB520003)资助
关键词 特征选择 降维 多重假设检验 Feature selection,Dimension reduction,Multiple hypothesis testing
  • 相关文献

参考文献10

  • 1Bradley Efron.Microarrays, Empirical Bayes and the Two-Groups Model. Statistical Science . 2008
  • 2Guedj Mickael,Robin Stephane,Celisse Alain,Nuel Gregory.Kerfdr: a semi-parametric kernel-based approach to local false discovery rate estimation. BMC Bioinformatics . 2009
  • 3Huber PJ.Projection pursuit. The Annals of Statistics . 1985
  • 4Xiao-Li Meng.Posterior Predictive $p$-Values. The Annals of Statistics . 1994
  • 5Benjamini Y,Hochberg Y.Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B Statistical Methodology . 1995
  • 6Qin Wen,Liu Yong,Jiang Tianzi,Yu Chunshui.The development of visual areas depends differently on visual experience. PloS one . 2013
  • 7Yoav Benjamini,Yosef Hochberg.On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics. Journal of Educational and Behavioral Statistics . 2000
  • 8Benjamini Y,Yekutieli D.The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics . 2001
  • 9John D. Storey.The positive false discovery rate: a Bayesian interpretation and the q-value. The Annals of Statistics . 2003
  • 10Virginia Goss Tusher,Robert Tibshirani,Gilbert Chu.Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences of the United States of America . 2001

同被引文献29

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部