A 5'-leader,known initially as the 5'-untranslated region,contains multiple isoforms due to alternative splicing(aS)and alternative transcription start site(aTSS).Therefore,a representative 5'-leader is de...A 5'-leader,known initially as the 5'-untranslated region,contains multiple isoforms due to alternative splicing(aS)and alternative transcription start site(aTSS).Therefore,a representative 5'-leader is demanded to examine the embedded RNA regulatory elements in controlling translation efficiency.Here,we develop a ranking algorithm and a deep-learning model to annotate representative 5'-leaders for five plant species.We rank the intra-sample and inter-sample frequency of aS-mediated transcript isoforms using the Kruskal-Wallis test-based algorithm and identify the representative aS-5'-leader.To further assign a representative 5'-end,we train the deep-learning model 5'leaderP to learn aTsS-mediated 5'-end distribution patterns from cap-analysis gene expression data.The model accurately predicts the 5'-end,confirmed experimentally in Arabidopsis and rice.The representative 5'-leader-contained gene models and 5'leaderP can be accessed at RNAirport(http:/www.rnairport.com/leader5P/).The Stage 1 annotation of 5'-leader records 5'-leader diversity and will pave the way to Ribo-Seq open-reading frame annotation,identical to the project recently initiated by human GENCODE.展开更多
基金supported by grants from the National Key R&D Program of China(2023ZD04073)the Major Project of Hubei Hongshan Laboratory(2022hszd016)+1 种基金the Key Research and Development Program of Hubei Province(2022BFE003)the National Natural Science Foundation of China(32070284)to G.Xu.
文摘A 5'-leader,known initially as the 5'-untranslated region,contains multiple isoforms due to alternative splicing(aS)and alternative transcription start site(aTSS).Therefore,a representative 5'-leader is demanded to examine the embedded RNA regulatory elements in controlling translation efficiency.Here,we develop a ranking algorithm and a deep-learning model to annotate representative 5'-leaders for five plant species.We rank the intra-sample and inter-sample frequency of aS-mediated transcript isoforms using the Kruskal-Wallis test-based algorithm and identify the representative aS-5'-leader.To further assign a representative 5'-end,we train the deep-learning model 5'leaderP to learn aTsS-mediated 5'-end distribution patterns from cap-analysis gene expression data.The model accurately predicts the 5'-end,confirmed experimentally in Arabidopsis and rice.The representative 5'-leader-contained gene models and 5'leaderP can be accessed at RNAirport(http:/www.rnairport.com/leader5P/).The Stage 1 annotation of 5'-leader records 5'-leader diversity and will pave the way to Ribo-Seq open-reading frame annotation,identical to the project recently initiated by human GENCODE.