摘要
Genome-wide transcriptome profiling identifies genes that are prone to differential expression(DE)across contexts,as well as genes with changes specific to the experimental manipulation.Distinguishing genes that are specifically changed in a context of interest from common differentially expressed genes(DEGs)allows more efficient prediction of which genes are specific to a given biological process under scrutiny.Currently,common DEGs or pathways can only be identified through the laborious manual curation of experiments,an inordinately time-consuming endeavor.Here we pioneer an approach,Specific cOntext Pattern Highlighting In Expression data(SOPHIE),for distinguishing between common and specific transcriptional patterns using a generative neural network to create a background set of experiments from which a null distribution of gene and pathway changes can be generated.We apply SOPHIE to diverse datasets including those from human,human cancer,and bacterial pathogen Pseudomonas aeruginosa.SOPHIE identifies common DEGs in concordance with previously described,manually and systematically determined common DEGs.Further molecular validation indicates that SOPHIE detects highly specific but low-magnitude biologically relevant transcriptional changes.SOPHIE’s measure of specificity can complement log2 fold change values generated from traditional DE analyses.For example,by filtering the set of DEGs,one can identify genes that are specifically relevant to the experimental condition of interest.Consequently,these results can inform future research directions.All scripts used in these analyses are available at https://github.com/greenelab/generic-expression-patterns.Users can access https://github.com/greenelab/sophie to run SOPHIE on their own data.
基金
supported by grants from the Gordon and Betty Moore Foundation of USA(Grant No.GBMF4552 to CSG)
the National Institutes of Health of USA(Grant Nos.R01 HG010067 to CSG,R01 CA237170 to CSG,U01 CA231978 to JCC)
the Cystic Fibrosis Foundation of USA(Grant No.HOGAN19G0 to DAH)
Support for the project was also provided by Dart CF at the Geisel School of Medicine at Dartmouth to DAH,which is supported by NIH NIDDK(Grant No.P30 DK117469)
the Cystic Fibrosis Foundation’s Research Development Program of USA(Grant No.STANTO19R0)
the bio MT through NIH NIGMS(Grant No.P20 GM113132)。