多重假设检验及其在大数据特征降维中的应用被引量：3

Multiple Hypothesis Testing and its Application in Feature Dimension Reduction

下载PDF

导出

摘要现有的特征降维方法大致可分为特征提取和特征选择。在特征提取过程中,数据中的原始特征通过某些数据变换被映射到一个低维空间。提取出的特征尽管与原始特征相关,但不再具有原始特征的物理意义,即特征提取改变了原始数据的表达形式。与特征提取不同,特征选择则在原有的特征集中选择一个子集,选择出的特征子集中不再含有与数据分析任务相关性不大或冗余的那部分特征,其结果可能引起信息丢失。因而现有的数据降维方法几乎都不是保真降维,其降维后的数据仅适合特定的后续数据分析任务,因而只能算是特定数据分析任务的前期数据预处理。从多重假设检验方法的角度分析了高维数据保真降维的方法及研究的关键所在。 The existing feature dimension reduction methods can roughly be categorized into two classes:feature extraction and feature selection.In feature extraction problems,the original features in the measurement space are initially transformed into a new dimension-reduced space via some specified transformation.Although the significant variables determined in the new space are related to the original variables,the physical interpretation in terms of the original variables may be lost.So,feature extraction will change the description of the original data.Unlike feature extraction,feature selection aims to seek optimal or suboptimal subsets of the original features by preserving the main information carried by the complete data to facilitate future analysis for high dimensional problems.Often,the selected features are a subset of the original features,and those insignificant and redundant features may be discarded.It is worth mentioning that almost all of the existing dimensionality reduction methods are not high fidelity methods.The result of these methods is only suitable for specific subsequent data analysis tasks,which is only aparticular task under the preprocess.In this paper,with the technique of multiple hypothesis testing,we studied the dimensionality high fidelity reduction problem.The processing results can save all the useful information and eliminate the irrelevant features from the original data.

作者潘舒祁云嵩

机构地区江苏科技大学计算机科学与工程学院

出处《计算机科学》 CSCD 北大核心 2015年第S1期89-93,共5页 Computer Science

基金国家自然科学基金(61471182) 江苏省高校自然科学基金(13KlB520003)资助

关键词特征选择降维多重假设检验 Feature selection,Dimension reduction,Multiple hypothesis testing

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献10

1Bradley Efron.Microarrays, Empirical Bayes and the Two-Groups Model. Statistical Science . 2008
2Guedj Mickael,Robin Stephane,Celisse Alain,Nuel Gregory.Kerfdr: a semi-parametric kernel-based approach to local false discovery rate estimation. BMC Bioinformatics . 2009
3Huber PJ.Projection pursuit. The Annals of Statistics . 1985
4Xiao-Li Meng.Posterior Predictive $p$-Values. The Annals of Statistics . 1994
5Benjamini Y,Hochberg Y.Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B Statistical Methodology . 1995
6Qin Wen,Liu Yong,Jiang Tianzi,Yu Chunshui.The development of visual areas depends differently on visual experience. PloS one . 2013
7Yoav Benjamini,Yosef Hochberg.On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics. Journal of Educational and Behavioral Statistics . 2000
8Benjamini Y,Yekutieli D.The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics . 2001
9John D. Storey.The positive false discovery rate: a Bayesian interpretation and the q-value. The Annals of Statistics . 2003
10Virginia Goss Tusher,Robert Tibshirani,Gilbert Chu.Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences of the United States of America . 2001

同被引文献29

1欧阳柏成.网络大数据下的冗余数据分类优化算法研究[J].微电子学与计算机,2015,32(1):128-130. 被引量：2
2梁岳.OEE数据分析的设计与应用[J].电子设计工程,2014,22(12):63-65. 被引量：3
3万涛,杨德胜,孙飞.电力大数据专业化分析技术研究与应用[J].中国管理信息化,2014,17(20):31-33. 被引量：5
4赵雪刚,宋骊平,姬红兵.量化量测条件下的交互多模型箱粒子滤波[J].西安电子科技大学学报,2014,41(6):37-44. 被引量：8
5刘科研,盛万兴,张东霞,贾东梨,胡丽娟,何开元.智能配电网大数据应用需求和场景分析研究[J].中国电机工程学报,2015,35(2):287-293. 被引量：269
6苏正洋,黄强松.网络大数据分析应用案例——户外媒体受众特征研究[J].电信技术,2014(12):27-29. 被引量：4
7王继业,季知祥,史梦洁,黄复鹏,朱朝阳,张东霞.智能配用电大数据需求分析与应用研究[J].中国电机工程学报,2015,35(8):1829-1836. 被引量：180
8李蔚,胡昊,徐富春,程子峰.大数据解析技术在大气环境监测中的应用研究[J].中国环境监测,2015,31(3):118-122. 被引量：24
9张婷.大数据的相关性特征在商业银行经营管理中的应用研究[J].农村金融研究,2015(9):45-48. 被引量：2
10吴晓英,明均仁.基于数据挖掘的大数据管理模型研究[J].情报科学,2015,33(11):131-134. 被引量：25

引证文献3

1姚一永,唐黎.大数据特征集准确检测仿真研究[J].计算机仿真,2016,33(8):453-456. 被引量：3
2吴军,欧阳艾嘉,张琳.基于影响度的统计显著序列模式挖掘算法[J].计算机应用,2022,42(9):2713-2721. 被引量：1
3陈旭.多源异构环境下基于层次约简分类的大数据清洗方法研究[J].电脑与电信,2023(1):86-90.

二级引证文献4

1石芳.中医临床数据中亚健康信息症状检测仿真[J].计算机仿真,2018,35(2):350-353. 被引量：4
2刘杰.大数据处理下的水面舰反潜鱼雷发现概率计算模型构设[J].舰船电子工程,2018,38(8):173-175.
3遇炳杰,张志,刘汝坤.基于云计算的大数据关键特征高效提取方法[J].科学技术与工程,2018,18(19):244-249. 被引量：4
4吴军,魏丹丹.面向课程教学数据的差异模式挖掘与讨论[J].计算机应用文摘,2023,39(7):115-117.

1许秀玲,汪晓东.传感器故障诊断方法研究[J].佛山科学技术学院学报（自然科学版）,2004,22(3):18-22. 被引量：6
2朱钦平,胡晓涵,祁云嵩.基于非线性回归分析的差异基因选择方法[J].科学技术与工程,2010,10(27):6675-6678. 被引量：1
3张敬华,李金铭,朱坤.MATLAB在微阵列数据分析的FDR控制中的实现[J].福建电脑,2011,27(8):91-92. 被引量：1
4范颖.采用奇异值分解移动学习云计算静态重删系统[J].科技通报,2015,31(2):55-57.
5许秀玲.卡尔曼滤波器在传感器软故障诊断中的应用[J].计算机工程与应用,2010,46(11):230-232. 被引量：1
6季赛,沈超.普通高校录取前期数据的远程传送[J].气象教育与科技,2003,25(3):41-46.
7刘成友,蒋红兵,丁勇,周鑫,刘康,吴静,秦航.基于基因表达谱数据筛选差异表达基因新方法[J].数学的实践与认识,2016,46(18):122-128. 被引量：5
8胡大江,朱小文.基于RS-LMBP算法的故障智能诊断系统[J].微电子学与计算机,2010,27(3):109-112. 被引量：3
9王铮,刘高峰.基于SDL的多假设态势估计方法研究[J].微计算机信息,2006(09S):13-15. 被引量：1
10齐高亮.省级教育网数据云平台的构建及实施[J].电信技术,2017(2):55-59. 被引量：2

计算机科学

2015年第S1期

浏览历史

内容加载中请稍等...

多重假设检验及其在大数据特征降维中的应用被引量：3

参考文献10

同被引文献29

引证文献3

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

多重假设检验及其在大数据特征降维中的应用 被引量：3

参考文献10

同被引文献29

引证文献3

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

多重假设检验及其在大数据特征降维中的应用被引量：3