摘要
候选基因关联研究中基因-基因、基因-环境交互作用的统计分析有利于揭示疾病的发生机制。本文针对病例对照设计的候选基因关联研究,综述交互作用的统计方法及其进展。交互作用的统计方法包括参数法和非参数法。参数法中最常用的为Logistic回归模型,非参数法主要是数据挖掘方法。有4类数据挖掘方法可用于候选基因关联研究,包括降维法、基于树的方法、模式识别法和贝叶斯法。本文对最常用且可靠的几种数据挖掘方法(多因子降维法、分类回归树、随机森林、贝叶斯上位效应关联图谱)的原理、分析过程和优缺点予以比较。参数法和非参数法分析交互作用时各有优缺点;低维数据的分析可采用参数法和非参数法,高维数据的分析则主要采用非参数法。随着基因分型技术的发展,可检测的SNP规模逐渐增大,使得非参数方法的应用越来越广。
Testing for gene-gene and gene-environment interactions in candidate gene association studies will help to reveal possible mechanisms underlying diseases.This article summarized the progress of statistical methods for testing interactions in candidate gene association studies based on case-control design.Parametric and non-parametric methods can be used to detect the interactions.Logistic regression is the most frequently used parametric method,and data mining techniques offer a variety of alternative non-parametric methods.Data mining techniques that can be applied in association studies consist of dimension reduction,tree-based approach,pattern recognition and Bayesian methods.Among alternative non-parametric methods we concentrated on the four methods which have become popular and are reliable for detection of interactions,including multifactor dimensionality reduction(MDR),classification and regression tree(CART),random forest,and Bayesian epistasis association mapping(BEAM).The principles,procedures,advantages and disadvantages of these methods have been discussed.Either parametric or non-parametric methods have the weak and the strong.For low-dimensional data,both parametric and non-parametric methods can be used in association studies.For high-dimensional data,non-parametric methods are the best choice.With the development of genotyping technologies and the scale of SNP database becoming large,non-parametric methods are used more and more widely in association studies.
出处
《复旦学报(医学版)》
CAS
CSCD
北大核心
2011年第3期265-270,共6页
Fudan University Journal of Medical Sciences
基金
国家自然科学基金项目(30271113)
国家科技部973项目(2002CB512902)
上海市劳动卫生学重点学科建设计划(08GWZX0402)