Studies on high-throughput global gene expression using microarray technology have generated ever larger amounts of systematic transcriptome data. A major challenge in exploiting these heterogeneous datasets is how to...Studies on high-throughput global gene expression using microarray technology have generated ever larger amounts of systematic transcriptome data. A major challenge in exploiting these heterogeneous datasets is how to normalize the expression profiles by inter-assay methods. Different non-linear and linear normalization methods have been developed, which essentially rely on the hypothesis that the true or perceived logarithmic fold-change distributions between two different assays are symmetric in nature. However, asymmetric gene expression changes are frequently observed, leading to suboptimal normalization results and in consequence potentially to thousands of false calls. Therefore, we have specifically investigated asymmetric comparative transcriptome profiles and developed the normalization using weighted negative second order exponential error functions (NeONORM) for robust and global inter-assay normalization. NeONORM efficiently damps true gene regulatory events in order to minimize their misleading impact on the norrealization process. We evaluated NeONORM's applicability using artificial and true experimental datasets, both of which demonstrated that NeONORM could be systematically applied to inter-assay and inter-condition comparisons.展开更多
Novel microarray technologies such as the AB1700 platform from Applied Biosysterns promise significant increases in the signal dynamic range and a higher sensitivity for weakly expressed transcripts. We have compared ...Novel microarray technologies such as the AB1700 platform from Applied Biosysterns promise significant increases in the signal dynamic range and a higher sensitivity for weakly expressed transcripts. We have compared a representative set of AB1700 data with a similarly representative Affymetrix HG-U133A dataset. The AB1700 design extends the signal dynamic detection range at the lower bound by one order of magnitude. The lognormal signal distribution profiles of these highsensitivity data need to be represented by two independent distributions. The additional second distribution covers those transcripts that would have gone undetected using the Affymetrix technology. The signal-dependent variance distribution in the AB1700 data is a non-trivial function of signal intensity, describable using a composite function. The drastically different structure of these highsensitivity transcriptome profiles requires adaptation or even redevelopment of the standard microarray analysis methods. Based on the statistical properties, we have derived a signal variance distribution model for AB1700 data that is necessary for such development. Interestingly, the dual lognormal distribution observed in the AB1700 data reflects two fundamentally different biologic mechanisms of transcription initiation.展开更多
In view of potential application to biomedical diagnosis, tight transcriptome data quality control is compulsory. Usually, quality control is achieved using labeling and hybridization controls added at different stage...In view of potential application to biomedical diagnosis, tight transcriptome data quality control is compulsory. Usually, quality control is achieved using labeling and hybridization controls added at different stages throughout the processing of the biologic RNA samples. These control measures, however, only reflect the performance of the individual technical manipulations during the entire process and have no bearing as to the continued integrity of the RNA sample itself. Here we demonstrate that intrinsic statistical properties of the resulting transcriptome data signal and signal-variance distributions and their invariance can be identified independently of the animal species studied and the labeling protocol used. From these invariant properties we have developed a data model, the pa- rameters of which can be estimatei:l from individual experiments and used to compute relative quality measures based on similarity with large reference datasets. These quality measures add supplementary, non-redundant in- formation to standard quality control estimates based on spike-in and hybridization controls, and are exploitable in data analysis. A software application for analyzing datasets as well as a reference dataset for AB 1700 arrays are provided. They should allow AB 1700 users to easily integrate this method into their analysis pipeline, and might instigate similar developments for other transcriptome platforms.展开更多
We have previously developed a combined signal/variance distribution model that accounts for the particular statistical properties of datasets generated on the Applied Biosystems AB1700 transcriptome system. Here we s...We have previously developed a combined signal/variance distribution model that accounts for the particular statistical properties of datasets generated on the Applied Biosystems AB1700 transcriptome system. Here we show that this model can be efficiently used to generate synthetic datasets with statistical properties virtually identical to those of the actual data by aid of the JAVA application ace.map creator 1.0 that we have developed. The fundamentally different structure of AB1700 transcriptome profiles requires re-evaluation, adaptation, or even redevelopment of many of the standard microarray analysis methods in order to avoid misinterpretation of the data on the one hand, and to draw full benefit from their increased specificity and sensitivity on the other hand. Our composite data model and the ace.map creator 1.0 application thereby not only present proof of the correctness of our parameter estimation, but also provide a tool for the generation of synthetic test data that will be useful for further development and testing of analysis methods.展开更多
文摘Studies on high-throughput global gene expression using microarray technology have generated ever larger amounts of systematic transcriptome data. A major challenge in exploiting these heterogeneous datasets is how to normalize the expression profiles by inter-assay methods. Different non-linear and linear normalization methods have been developed, which essentially rely on the hypothesis that the true or perceived logarithmic fold-change distributions between two different assays are symmetric in nature. However, asymmetric gene expression changes are frequently observed, leading to suboptimal normalization results and in consequence potentially to thousands of false calls. Therefore, we have specifically investigated asymmetric comparative transcriptome profiles and developed the normalization using weighted negative second order exponential error functions (NeONORM) for robust and global inter-assay normalization. NeONORM efficiently damps true gene regulatory events in order to minimize their misleading impact on the norrealization process. We evaluated NeONORM's applicability using artificial and true experimental datasets, both of which demonstrated that NeONORM could be systematically applied to inter-assay and inter-condition comparisons.
文摘Novel microarray technologies such as the AB1700 platform from Applied Biosysterns promise significant increases in the signal dynamic range and a higher sensitivity for weakly expressed transcripts. We have compared a representative set of AB1700 data with a similarly representative Affymetrix HG-U133A dataset. The AB1700 design extends the signal dynamic detection range at the lower bound by one order of magnitude. The lognormal signal distribution profiles of these highsensitivity data need to be represented by two independent distributions. The additional second distribution covers those transcripts that would have gone undetected using the Affymetrix technology. The signal-dependent variance distribution in the AB1700 data is a non-trivial function of signal intensity, describable using a composite function. The drastically different structure of these highsensitivity transcriptome profiles requires adaptation or even redevelopment of the standard microarray analysis methods. Based on the statistical properties, we have derived a signal variance distribution model for AB1700 data that is necessary for such development. Interestingly, the dual lognormal distribution observed in the AB1700 data reflects two fundamentally different biologic mechanisms of transcription initiation.
文摘In view of potential application to biomedical diagnosis, tight transcriptome data quality control is compulsory. Usually, quality control is achieved using labeling and hybridization controls added at different stages throughout the processing of the biologic RNA samples. These control measures, however, only reflect the performance of the individual technical manipulations during the entire process and have no bearing as to the continued integrity of the RNA sample itself. Here we demonstrate that intrinsic statistical properties of the resulting transcriptome data signal and signal-variance distributions and their invariance can be identified independently of the animal species studied and the labeling protocol used. From these invariant properties we have developed a data model, the pa- rameters of which can be estimatei:l from individual experiments and used to compute relative quality measures based on similarity with large reference datasets. These quality measures add supplementary, non-redundant in- formation to standard quality control estimates based on spike-in and hybridization controls, and are exploitable in data analysis. A software application for analyzing datasets as well as a reference dataset for AB 1700 arrays are provided. They should allow AB 1700 users to easily integrate this method into their analysis pipeline, and might instigate similar developments for other transcriptome platforms.
文摘We have previously developed a combined signal/variance distribution model that accounts for the particular statistical properties of datasets generated on the Applied Biosystems AB1700 transcriptome system. Here we show that this model can be efficiently used to generate synthetic datasets with statistical properties virtually identical to those of the actual data by aid of the JAVA application ace.map creator 1.0 that we have developed. The fundamentally different structure of AB1700 transcriptome profiles requires re-evaluation, adaptation, or even redevelopment of many of the standard microarray analysis methods in order to avoid misinterpretation of the data on the one hand, and to draw full benefit from their increased specificity and sensitivity on the other hand. Our composite data model and the ace.map creator 1.0 application thereby not only present proof of the correctness of our parameter estimation, but also provide a tool for the generation of synthetic test data that will be useful for further development and testing of analysis methods.