A 5'-leader,known initially as the 5'-untranslated region,contains multiple isoforms due to alternative splicing(aS)and alternative transcription start site(aTSS).Therefore,a representative 5'-leader is de...A 5'-leader,known initially as the 5'-untranslated region,contains multiple isoforms due to alternative splicing(aS)and alternative transcription start site(aTSS).Therefore,a representative 5'-leader is demanded to examine the embedded RNA regulatory elements in controlling translation efficiency.Here,we develop a ranking algorithm and a deep-learning model to annotate representative 5'-leaders for five plant species.We rank the intra-sample and inter-sample frequency of aS-mediated transcript isoforms using the Kruskal-Wallis test-based algorithm and identify the representative aS-5'-leader.To further assign a representative 5'-end,we train the deep-learning model 5'leaderP to learn aTsS-mediated 5'-end distribution patterns from cap-analysis gene expression data.The model accurately predicts the 5'-end,confirmed experimentally in Arabidopsis and rice.The representative 5'-leader-contained gene models and 5'leaderP can be accessed at RNAirport(http:/www.rnairport.com/leader5P/).The Stage 1 annotation of 5'-leader records 5'-leader diversity and will pave the way to Ribo-Seq open-reading frame annotation,identical to the project recently initiated by human GENCODE.展开更多
Alternative splicing of pre-mRNA transcripts is an important regulatory mechanism that increases the diversity of gene products in eukaryotes.Various studies have linked specific transcript isoforms to altered drug re...Alternative splicing of pre-mRNA transcripts is an important regulatory mechanism that increases the diversity of gene products in eukaryotes.Various studies have linked specific transcript isoforms to altered drug response in cancer;however,few algorithms have incorporated splicing information into drug response prediction.In this study,we evaluated whether basal-level splicing information could be used to predict drug sensitivity by constructing doxorubicin-sensitivity classification models with splicing and expression data.We detailed splicing differences between sensitive and resistant cell lines by implementing quasi-binomial generalized linear modeling(QBGLM)and found altered inclusion of 277 skipped exons.We additionally conducted RNA-binding protein(RBP)binding motif enrichment and differential ex-pression analysis to characterize cis-and trans-acting elements that potentially influence doxorubicin response-mediating splicing alterations.Our results showed that a classification model built with skipped exon data exhibited strong predictive power.We discovered an association between differentially spliced events and epithelial-mesenchymal transition(EMT)and observed motif enrichment,as well as differential expression of RBFOX and ELAVL RBP family members.Our work demonstrates the potential of incorporating splicing data into drug response algorithms and the utility of a QBGLM approach for fast,scalable identification of relevant splicing differences between large groups of samples.展开更多
基金supported by grants from the National Key R&D Program of China(2023ZD04073)the Major Project of Hubei Hongshan Laboratory(2022hszd016)+1 种基金the Key Research and Development Program of Hubei Province(2022BFE003)the National Natural Science Foundation of China(32070284)to G.Xu.
文摘A 5'-leader,known initially as the 5'-untranslated region,contains multiple isoforms due to alternative splicing(aS)and alternative transcription start site(aTSS).Therefore,a representative 5'-leader is demanded to examine the embedded RNA regulatory elements in controlling translation efficiency.Here,we develop a ranking algorithm and a deep-learning model to annotate representative 5'-leaders for five plant species.We rank the intra-sample and inter-sample frequency of aS-mediated transcript isoforms using the Kruskal-Wallis test-based algorithm and identify the representative aS-5'-leader.To further assign a representative 5'-end,we train the deep-learning model 5'leaderP to learn aTsS-mediated 5'-end distribution patterns from cap-analysis gene expression data.The model accurately predicts the 5'-end,confirmed experimentally in Arabidopsis and rice.The representative 5'-leader-contained gene models and 5'leaderP can be accessed at RNAirport(http:/www.rnairport.com/leader5P/).The Stage 1 annotation of 5'-leader records 5'-leader diversity and will pave the way to Ribo-Seq open-reading frame annotation,identical to the project recently initiated by human GENCODE.
基金supported by the National Institutes of Health,USA(Grant No.R01CA213466)awarded to YL.the Precision Health Initiative at Indiana University.
文摘Alternative splicing of pre-mRNA transcripts is an important regulatory mechanism that increases the diversity of gene products in eukaryotes.Various studies have linked specific transcript isoforms to altered drug response in cancer;however,few algorithms have incorporated splicing information into drug response prediction.In this study,we evaluated whether basal-level splicing information could be used to predict drug sensitivity by constructing doxorubicin-sensitivity classification models with splicing and expression data.We detailed splicing differences between sensitive and resistant cell lines by implementing quasi-binomial generalized linear modeling(QBGLM)and found altered inclusion of 277 skipped exons.We additionally conducted RNA-binding protein(RBP)binding motif enrichment and differential ex-pression analysis to characterize cis-and trans-acting elements that potentially influence doxorubicin response-mediating splicing alterations.Our results showed that a classification model built with skipped exon data exhibited strong predictive power.We discovered an association between differentially spliced events and epithelial-mesenchymal transition(EMT)and observed motif enrichment,as well as differential expression of RBFOX and ELAVL RBP family members.Our work demonstrates the potential of incorporating splicing data into drug response algorithms and the utility of a QBGLM approach for fast,scalable identification of relevant splicing differences between large groups of samples.