Gastric cancer(GC)is one of the most common cancers and ranks the third in cancer mortality all over the world.The goal of this study was to identify potential hub-genes,highlighting their functions,signaling pathways...Gastric cancer(GC)is one of the most common cancers and ranks the third in cancer mortality all over the world.The goal of this study was to identify potential hub-genes,highlighting their functions,signaling pathways,and candidate drugs for the treatment of GC patients.We used publicly available next generation sequencing(NGS)data to identify differentially expressed(DE)genes.The top DE genes were mapped to STRING database to construct the protein-protein interaction(PPI)network and top hub genes were selected for further analysis.We found a total of 1555 DE genes with 870 upregulated and 685 downregulated genes in GC.We selected the top 400(200 upregulated and 200 downregulated)genes to construct a PPI network and extracted the top 15 hub genes.The gene ontology(GO)term and kyoto encyclopedia of genes and genomes(KEGG)pathway enrichment analyses of the 15 hub genes exposed some important functions and signaling pathways that were significantly associated with GC patients.The survival analysis of the hub genes disclosed that the lower expressions of the three hub genes CDH2,COL4A1,and COL5A2 were associated with better survival of GC patients.These three genes might be the candidate biomarkers for the diagnosis and treatment of GC.Then,we considered 3 key proteins(genomic biomarkers)(COL4A1,CDH2,and CO5A2)as the drug target proteins(receptors),performed their docking analysis with the 102 meta-drug agents,and found Everolimus,Docetaxel,Lanreotide,Venetoclax,Temsirolimus,and Nilotinib as the top ranked 6 candidate drugs with respect to our proposed target proteins for the treatment against GC patients.Therefore,the proposed drugs might play vital role for the treatment against GC patients.展开更多
With the development of sequencing technologies,somatic mutation analysis has become an important component in cancer research and treatment.VarDict is a commonly used somatic variant caller for this task.Although the...With the development of sequencing technologies,somatic mutation analysis has become an important component in cancer research and treatment.VarDict is a commonly used somatic variant caller for this task.Although the heuristic-based VarDict algorithm exhibits high sensitivity and versatility,it may detect higher amounts of false positive variants than callers,limiting its clinical practicality.To address this problem,we propose DeepFilter,a deep-learning based filter for VarDict,which can filter out the false positive variants detected by VarDict effectively.Our approach trains two models for insertion-deletion mutations(InDels)and single nucleotide variants(SNVs),respectively.Experiments show that DeepFilter can filter at least 98.5%of false positive variants and retain 93.5%of true positive variants for InDels and SNVs in the commonly used tumor-normal paired mode.Source code and pre-trained models are available at https://github.com/LeiHaoa/DeepFilter.展开更多
A fundamental problem with complex time series analysis involves data prediction and repair.However,existing methods are not accurate enough for complex and multidimensional time series data.In this paper,we propose a...A fundamental problem with complex time series analysis involves data prediction and repair.However,existing methods are not accurate enough for complex and multidimensional time series data.In this paper,we propose a novel approach,a complex time series predic-tion model,which is based on the conditional randomfield(CRF)and recurrent neural network(RNN).This model can be used as an upper-level predictor in the stacking process or be trained using deep learning methods.Our approach is more accurate than existing methods in some suitable scenarios,as shown in the experimental results.展开更多
Sequence-based protein tertiary structure prediction is of fundamental importance because the function of a protein ultimately depends on its 3 D structure.An accurate residue-residue contact map is one of the essenti...Sequence-based protein tertiary structure prediction is of fundamental importance because the function of a protein ultimately depends on its 3 D structure.An accurate residue-residue contact map is one of the essential elements for current ab initio prediction protocols of 3 D structure prediction.Recently,with the combination of deep learning and direct coupling techniques,the performance of residue contact prediction has achieved significant progress.However,a considerable number of current Deep-Learning(DL)-based prediction methods are usually time-consuming,mainly because they rely on different categories of data types and third-party programs.In this research,we transformed the complex biological problem into a pure computational problem through statistics and artificial intelligence.We have accordingly proposed a feature extraction method to obtain various categories of statistical information from only the multi-sequence alignment,followed by training a DL model for residue-residue contact prediction based on the massive statistical information.The proposed method is robust in terms of different test sets,showed high reliability on model confidence score,could obtain high computational efficiency and achieve comparable prediction precisions with DL methods that relying on multi-source inputs.展开更多
基金This work was partly supported by the National Key Research and Development Program of China(No.2018YFB0204403)Key Research and Development Project of Guangdong Province(No.2021B0101310002)+4 种基金Strategic Priority CAS Project(No.XDB38050100)National Science Foundation of China(No.U1813203)the Shenzhen Basic Research Fund(Nos.RCYX2020071411473419,KQTD20200820113106007,and JSGG20201102163800001)CAS Key Lab(No.2011DP173015)the Youth Innovation Promotion Association(No.Y2021101).
文摘Gastric cancer(GC)is one of the most common cancers and ranks the third in cancer mortality all over the world.The goal of this study was to identify potential hub-genes,highlighting their functions,signaling pathways,and candidate drugs for the treatment of GC patients.We used publicly available next generation sequencing(NGS)data to identify differentially expressed(DE)genes.The top DE genes were mapped to STRING database to construct the protein-protein interaction(PPI)network and top hub genes were selected for further analysis.We found a total of 1555 DE genes with 870 upregulated and 685 downregulated genes in GC.We selected the top 400(200 upregulated and 200 downregulated)genes to construct a PPI network and extracted the top 15 hub genes.The gene ontology(GO)term and kyoto encyclopedia of genes and genomes(KEGG)pathway enrichment analyses of the 15 hub genes exposed some important functions and signaling pathways that were significantly associated with GC patients.The survival analysis of the hub genes disclosed that the lower expressions of the three hub genes CDH2,COL4A1,and COL5A2 were associated with better survival of GC patients.These three genes might be the candidate biomarkers for the diagnosis and treatment of GC.Then,we considered 3 key proteins(genomic biomarkers)(COL4A1,CDH2,and CO5A2)as the drug target proteins(receptors),performed their docking analysis with the 102 meta-drug agents,and found Everolimus,Docetaxel,Lanreotide,Venetoclax,Temsirolimus,and Nilotinib as the top ranked 6 candidate drugs with respect to our proposed target proteins for the treatment against GC patients.Therefore,the proposed drugs might play vital role for the treatment against GC patients.
基金This work was partially supported by the National Natural Science Foundation of China(NSFC)(Nos.62102231 and 61972231)the Shenzhen Basic Research Fund(No.JCYJ20180507182818013)+3 种基金the Key Project of Joint Fund of Shandong Province(No.ZR2019LZH007)Shandong Provincial Natural Science Foundation(No.ZR2021QF089)the PPP project from CSC and DAADEngineering Research Center of Digital Media Technology,Ministry of Education,China.
文摘With the development of sequencing technologies,somatic mutation analysis has become an important component in cancer research and treatment.VarDict is a commonly used somatic variant caller for this task.Although the heuristic-based VarDict algorithm exhibits high sensitivity and versatility,it may detect higher amounts of false positive variants than callers,limiting its clinical practicality.To address this problem,we propose DeepFilter,a deep-learning based filter for VarDict,which can filter out the false positive variants detected by VarDict effectively.Our approach trains two models for insertion-deletion mutations(InDels)and single nucleotide variants(SNVs),respectively.Experiments show that DeepFilter can filter at least 98.5%of false positive variants and retain 93.5%of true positive variants for InDels and SNVs in the commonly used tumor-normal paired mode.Source code and pre-trained models are available at https://github.com/LeiHaoa/DeepFilter.
基金Supported by The National Key Research and Development Program of China(2020YFB1006104).
文摘A fundamental problem with complex time series analysis involves data prediction and repair.However,existing methods are not accurate enough for complex and multidimensional time series data.In this paper,we propose a novel approach,a complex time series predic-tion model,which is based on the conditional randomfield(CRF)and recurrent neural network(RNN).This model can be used as an upper-level predictor in the stacking process or be trained using deep learning methods.Our approach is more accurate than existing methods in some suitable scenarios,as shown in the experimental results.
基金supported by the Strategic Priority CAS Project (No. XDB38050100)the National Key Research and Development Program of China (No. 2018YFB0204403)+4 种基金the National Natural Science Foundation of China (No. U1813203)the Shenzhen Basic Research Fund (Nos. RCYX2020071411473419,JCYJ20200109114818703,and JSGG20201102163800001)CAS Key Lab (No. 2011DP173015)Hong Kong Research Grant Council (No. GRF-17208019)the Outstanding Youth Innovation Fund (Doctoral Students) of CAS-SIAT (No. Y9G054)。
文摘Sequence-based protein tertiary structure prediction is of fundamental importance because the function of a protein ultimately depends on its 3 D structure.An accurate residue-residue contact map is one of the essential elements for current ab initio prediction protocols of 3 D structure prediction.Recently,with the combination of deep learning and direct coupling techniques,the performance of residue contact prediction has achieved significant progress.However,a considerable number of current Deep-Learning(DL)-based prediction methods are usually time-consuming,mainly because they rely on different categories of data types and third-party programs.In this research,we transformed the complex biological problem into a pure computational problem through statistics and artificial intelligence.We have accordingly proposed a feature extraction method to obtain various categories of statistical information from only the multi-sequence alignment,followed by training a DL model for residue-residue contact prediction based on the massive statistical information.The proposed method is robust in terms of different test sets,showed high reliability on model confidence score,could obtain high computational efficiency and achieve comparable prediction precisions with DL methods that relying on multi-source inputs.