Gene expression data have been very useful during the past two decades for the detection of differentially expressed genes when two (or more) biological conditions are compared. Studies seeking for differentially expr...Gene expression data have been very useful during the past two decades for the detection of differentially expressed genes when two (or more) biological conditions are compared. Studies seeking for differentially expressed genes are based on testing gene by gene for a mean differential expression between two conditions. Nevertheless, the global shift in gene expression when taking into account all genes present on a microarray experiment, has not yet been investigated and could provide different information on genes that could be affected by the condition under research. Such a global approach would help identifying a gene expression threshold, characteristic of a certain condition and therefore could be used for diagnosis together with the list of differentially expressed genes detected by classical methods. Moreover, characterizing genes below or above such a threshold could give new insights into the molecular mechanisms implicated functionally in each condition. Here, we present a simple methodology, based on heuristics, gene filtering, variable transformation and descriptive statistics in order to identify such global gene expression shifts and the characteristic threshold so the same can be applied by any professional that works with expression gene data and not only by statisticians. Our procedure is illustrated on a real gene expression data set comparing pathogen inoculated tomatoes with non-inoculated tomatoes. This methodology can be used for the identification of the threshold values when we have continuous variable data sets from two populations with overlapped distributional forms (histograms) in most of their percentiles.展开更多
文摘Gene expression data have been very useful during the past two decades for the detection of differentially expressed genes when two (or more) biological conditions are compared. Studies seeking for differentially expressed genes are based on testing gene by gene for a mean differential expression between two conditions. Nevertheless, the global shift in gene expression when taking into account all genes present on a microarray experiment, has not yet been investigated and could provide different information on genes that could be affected by the condition under research. Such a global approach would help identifying a gene expression threshold, characteristic of a certain condition and therefore could be used for diagnosis together with the list of differentially expressed genes detected by classical methods. Moreover, characterizing genes below or above such a threshold could give new insights into the molecular mechanisms implicated functionally in each condition. Here, we present a simple methodology, based on heuristics, gene filtering, variable transformation and descriptive statistics in order to identify such global gene expression shifts and the characteristic threshold so the same can be applied by any professional that works with expression gene data and not only by statisticians. Our procedure is illustrated on a real gene expression data set comparing pathogen inoculated tomatoes with non-inoculated tomatoes. This methodology can be used for the identification of the threshold values when we have continuous variable data sets from two populations with overlapped distributional forms (histograms) in most of their percentiles.