摘要
使用计算机模拟数据和真实的芯片数据,对8种筛选差异表达基因的方法进行了比较分析,旨在比较不同方法对基因芯片数据的筛选效果。模拟数据分析表明,所使用的8种方法对均匀分布的差异表达基因有很好的识别、检出作用。算法方面,SAM和Wilcoxon秩和检验方法较好;数据分布方面,正态分布的识别效果较好,卡方分布和指数分布的识别效果较差。杨树cDNA芯片分析表明,SAM、Samroc和回归模型方法相近,而Wilcoxon秩和检验方法与它们有较大差异。
DNA microarray is a new tool in biotechnology, which allows simultaneously monitoring thousands of gene expression in cells. The goal of differential gene expression analysis is to detect genes with significant change of gene ex- pression levels arising from experimental conditions. Although various statistical methods have been suggested to confirm differential gene expression, only a few studies compared performance of the statistical methods. This paper presented comparison of statistical methods for finding differentially expressed genes (DEGs) from the microarray data. Using simu- lated and real datasets (Populus cDNA microarray data), we compared eight methods of identifying differential gene ex- pression. The simulated datasets included four differential distributions (normal distribution, uniform distribution, Z2 distri- bution, and exponential distribution). The results of simulated datasets analysis showed that the eight methods were more preferable with the microarray data of uniform distribution than normal distribution. They were not preferable with the ~2 distribution and exponential distribution. Of these eight methods, SAM (Significance Analysis of Microarrays) and Wil- coxon rank sum test performed well in most cases. The results of real cDNA microarray data of Populus showed that there was much similarity of SAM, Samroc, and regression modeling approach. Wilcoxon rank sum test was different from them. Samroc and regression modeling approach were similar in the eight methods. For both simulated and real datasets, SAM, Samroc, and regression modeling approach performed better than other methods.
出处
《遗传》
CAS
CSCD
北大核心
2008年第12期1640-1646,共7页
Hereditas(Beijing)
基金
江苏省自然科学基金“重要模式树种(杨树和杉木功能基因组学研究)”项目(编号:BK2003213)资助~~
关键词
基因芯片
杨树
差异表达
microarray
Populus
differential expression