多重检验技术在大数据分析中的应用

Application of Multiple Test Techniques in Big Data Analysis

下载PDF

导出

摘要在对大数据进行假设检验时,为了控制假阳性,需要采用多重检验技术。多重检验技术有多种,本文通过对大数据进行实际分析,比较各种算法的优缺点,给出不同方法的适用场合,从而对数据分析人员给以理论上的指导。文章首先阐述多重检验的必要性以及多重检验的相关概念;其次分别介绍对总体错误率和错误发现率进行控制的两类方法;最后将这几种多重检验方法应用到基因大数据中对基因的表达与否进行判断。实验结果表明,控制错误发现率的方法优于控制总体错误率的方法,在控制错误发现率的方法中,q值法的结果最好。原因在于q值法考虑了原假设的先验信息,能很好地控制错误发现率的大小,因此具有较高的精确性和检验功效。 In the hypothesis test of big data, in order to control false positives, multiple test technology needs to be used. There are many kinds of multiple test techniques. This paper makes a practical analysis of big data, compares the advantages and disadvantages of various algorithms, and gives the application occasions of different methods, so as to give theoretical guidance to data analysts. Firstly, this paper expounds the necessity and the related concepts of multiple testing;Secondly, two kinds of methods to control the family-wise error rate and false discovery rate are introduced respectively;Finally, these multiple test methods are applied to gene big data to judge whether the genes are expressed or not. The experimental results show that the method of controlling the false discovery rate is better than the method of controlling the family-wise error rate. Among the methods of controlling the false discovery rate, the q-value method has the best result. The reason is that the q-value method considers the prior information of the original hypothesis and can well control the false discovery rate, so it has high accuracy and power.

作者杜欢刘瑞银周志慧

机构地区沈阳师范大学数学与系统科学学院

出处《应用数学进展》 2021年第10期3532-3538,共7页 Advances in Applied Mathematics

关键词大数据多重假设检验总体错误率错误发现率 Q值

分类号 G63 [文化科学—教育学]

引文网络
相关文献

参考文献2

1刘遵雄,陈昊.多重相关检验中错误发现率的控制算法[J].井冈山大学学报（自然科学版）,2016,37(3):35-40. 被引量：1
2王婷,曾平,黄水平,赵华硕,金英良.错误发现率和q值及其微阵列数据分析的应用[J].现代预防医学,2013,40(5):811-814. 被引量：1

二级参考文献26

1Schena M. Microarray analysis [M]. New York: John Wiley&Sons, 2003.
2Dudoit S, Shaffer JP, Boldrick JC. Multiple hypothesis testing in microarray experiments[J]. Statist Sci, 2003, 18 (1): 71-103.
3Mcconnell P, Lin SM, Hurban P. Methods of microarray data analysis V[M]. New York: Springer, 2007.
4Efron B. Large-Scale inference: empirical bayes methods for esti- mation, testing, and prediction [ M ]. New York : Cambridge Uni- versity Press, 2010.
5Lehmann EL, Romano J. Testing Statistical Hypotheses [M]. 3rd edition. New York: Springer, 2005.
6Shaffer JP. Multiple hypothesis testing [J]. Annu Rev Psychol, 1995, 46: 561-584.
7Bretz F, Hothorn T, Westfall P. Multiple comparisons using R [M]. London: Chapman&Hall, 2010.
8Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing [J]. J R Statist Soc B, 1995, 57 (1): 289-300.
9Holm S. A simple sequentially rejective multiple test procedure[J]. Scand J Statist 1979, 6: 65-70.
10Hommel G. A stagewise rejective multiple test procedure based on a modified Bonferroni test[J]. Biometrika, 1988, 75: 383-386.

1蔡蓁.准则和可普遍化检验[J].哲学门,2003,4(2):124-132. 被引量：1
2杨荣海.人民币国际化“双循环”程度测算:模型与实证[J].经济学家,2021(9):89-99. 被引量：4
3邓新杰,曹云辉,吴吉林.具有内生性与时变方差的线性模型参数稳定性检验[J].统计与决策,2021,37(18):19-23.
4张夏,丰晖.关于拖拉机和联合收割机社会化检验的思考[J].江苏农机化,2021(5):21-23. 被引量：1
5房祥忠.统计检验中的P值[J].中国统计,2021,36(9):17-19. 被引量：2
6牛玉杰,褚晨,庞雅贤,沈亚欣,李立鹏,刘清萍,张荣.大学生血清金属水平与抑郁、焦虑关联的横断面研究[J].环境与职业医学,2021,38(10):1069-1076. 被引量：2
7吴晓东,万宇飞,陈正文,程琳,张公涛.高粘原油输送技术对比性研究[J].中国石油和化工标准与质量,2021,41(17):189-190.
8董良.待生酸处理装置金属膨胀节的选型及应用[J].石油化工设计,2021,38(4):50-54. 被引量：1

应用数学进展

2021年第10期

浏览历史

内容加载中请稍等...

多重检验技术在大数据分析中的应用

参考文献2

二级参考文献26

相关作者

相关机构

相关主题

浏览历史