Copy number variation(CNV)refers to the number of copies of a specific sequence in a genome and is a type of chromatin structural variation.The development of the Hi-C technique has empowered research on the spatial s...Copy number variation(CNV)refers to the number of copies of a specific sequence in a genome and is a type of chromatin structural variation.The development of the Hi-C technique has empowered research on the spatial structure of chromatins by capturing interactions between DNA fragments.We utilized machine-learning methods including the linear transformation model and graph convolutional network(GCN)to detect CNV events from Hi-C data and reveal how CNV is related to three-dimensional interactions between genomic fragments in terms of the one-dimensional read count signal and features of the chromatin structure.The experimental results demonstrated a specific linear relation between the Hi-C read count and CNV for each chromosome that can be well qualified by the linear transformation model.In addition,the GCN-based model could accurately extract features of the spatial structure from Hi-C data and infer the corresponding CNV across different chromosomes in a cancer cell line.We performed a series of experiments including dimension reduction,transfer learning,and Hi-C data perturbation to comprehensively evaluate the utility and robustness of the GCN-based model.This work can provide a benchmark for using machine learning to infer CNV from Hi-C data and serves as a necessary foundation for deeper understanding of the relationship between Hi-C data and CNV.展开更多
Background:The hierarchical three-dimensional(3D)architectures of chromatin play an important role in fundamental biological processes,such as cell differentiation,cellular senescence,and transcriptional regulation.Ab...Background:The hierarchical three-dimensional(3D)architectures of chromatin play an important role in fundamental biological processes,such as cell differentiation,cellular senescence,and transcriptional regulation.Aberrant chromatin 3D structural alterations often present in human diseases and even cancers,but their underlying mechanisms remain unclear.Results:3D chromatin structures(chromatin compartment A/B,topologically associated domains,and enhancerpromoter interactions)play key roles in cancer development,metastasis,and drug resistance.Bioinformatics techniques based on machine learning and deep learning have shown great potential in the study of 3D cancer genome.Conclusion:Current advances in the study of the 3D cancer genome have expanded our understanding of the mechanisms underlying tumorigenesis and development.It will provide new insights into precise diagnosis and personalized treatment for cancers.展开更多
Advances in biological and medical technologies have been providing us explosive vol- umes of biological and physiological data, such as medical images, electroencephalography, geno- mic and protein sequences. Learnin...Advances in biological and medical technologies have been providing us explosive vol- umes of biological and physiological data, such as medical images, electroencephalography, geno- mic and protein sequences. Learning from these data facilitates the understanding of human health and disease. Developed from artificial neural networks, deep learning-based algorithms show great promise in extracting features and learning patterns from complex data. The aim of this paper is to provide an overview of deep learning techniques and some of the state-of-the-art applications in the biomedical field. We first introduce the development of artificial neural network and deep learning. We then describe two main components of deep learning, i.e., deep learning architectures and model optimization. Subsequently, some examples are demonstrated for deep learning展开更多
microRNAs(miRNAs),particularly the exosomal miRNAs have been widely used as biomarkers and promising therapeutic targets in cancer.However,a comprehensive analysis of miRNA-gene regulatory network with clinical signif...microRNAs(miRNAs),particularly the exosomal miRNAs have been widely used as biomarkers and promising therapeutic targets in cancer.However,a comprehensive analysis of miRNA-gene regulatory network with clinical significance remains scarce.The emergence of high-throughput multi-omics data over large,well-characterized patient cohorts provides an unprecedented opportunity to address this problem.Herein,we performed a clinic-centered analysis to identify cancer-associated miRNAs,miRNA-target axis.We first calculated the correlation among miRNA,mRNA and 75 unique clinico-pathological characteristics(CPCs)in 26 cancer types,and established an online resource(4CR).Interestingly,we found that the high expression of several DNA methylation-related enzymes was associated with adverse outcomes of cancer patients,and these genes were regulated by a cluster of miRNAs.Furthermore,by integrating exosomal miRNA and m RNA databases,we identified exosomal miRNA biomarkers for non-invasive cancer surveillance and therapy monitoring.Finally,we explored the role of CPC-related miRNAs for therapeutic effect prediction of drugs based on their shared targets.Our analysis pipeline illustrated the significance of clinic-centered analysis in miRNA-gene pair identification and provided helpful clues for future cancer studies.展开更多
The accumulation of various types of drug informatics data and computational approaches for drug repositioning can accelerate pharmaceutical research and development.However,the integration of multi-dimensional drug d...The accumulation of various types of drug informatics data and computational approaches for drug repositioning can accelerate pharmaceutical research and development.However,the integration of multi-dimensional drug data for precision repositioning remains a pressing challenge.Here,we propose a systematic framework named PIMD to predict drug therapeutic properties by integrating multi-dimensional data for drug repositioning.In PIMD,drug similarity networks(DSNs)based on chemical,pharmacological,and clinical data are fused into an integrated DSN(iDSN)composed of many clusters.Rather than simple fusion,PIMD offers a systematic way to annotate clusters.Unexpected drugs within clusters and drug pairs with a high iDSN similarity score are therefore identified to predict novel therapeutic uses.PIMD provides new insights into the universality,individuality,and complementarity of different drug properties by evaluating the contribution of each property data.To test the performance of PIMD,we use chemical,pharmacological,and clinical properties to generate an iDSN.Analyses of the contributions of each drug property indicate that this iDSN was driven by all data types and performs better than other DSNs.Within the top 20 recommended drug pairs,7 drugs have been reported to be repurposed.The source code for PIMD is available at https://github.com/Sepstar/PIMD/.展开更多
Genome-wide physical protein±protein interaction(PPI)mapping remains a major challenge for current technologies.Here,we reported a high-efficiency BiFC-seq method,yeastenhanced green fluorescent protein-based bim...Genome-wide physical protein±protein interaction(PPI)mapping remains a major challenge for current technologies.Here,we reported a high-efficiency BiFC-seq method,yeastenhanced green fluorescent protein-based bimolecular fluorescence complementation(y EGFPBiFC)coupled with next-generation DNA sequencing,for interactome mapping.We first applied y EGFP-BiFC method to systematically investigate an intraviral network of the Ebola virus.Two-thirds(9/14)of known interactions of EBOV were recaptured,and five novel interactions were discovered.Next,we used the BiFC-seq method to map the interactome of the tumor protein p53.We identified 97 interactors of p53,more than three-quarters of which were novel.Furthermore,in a more complex background,we screened potential interactors by pooling two BiFC libraries together and revealed a network of 229 interactions among 205 proteins.These results show that BiFC-seq is a highly sensitive,rapid,and economical method for genome-wide interactome mapping.展开更多
Functional enrichment analysis is pivotal for interpreting highthroughput omics data in life science.It is crucial for this type of tool to use the latest annotation databases for as many organisms as possible.To meet...Functional enrichment analysis is pivotal for interpreting highthroughput omics data in life science.It is crucial for this type of tool to use the latest annotation databases for as many organisms as possible.To meet these requirements,we present here an updated version of our popular Bioconductor package,clusterProfiler 4.0.This package has been enhanced considerably compared with its original version published 9 years ago.The new version provides a universal interface for functional enrichment analysis in thousands of organisms based on internally supported ontologies and pathways as well as annotation data provided by users or derived from online databases.It also extends the dplyr and ggplot2 packages to offer tidy interfaces for data operation and visualization.Other new features include gene set enrichment analysis and comparison of enrichment results from multiple gene lists.We anticipate that clusterProfiler 4.0 will be applied to a wide range of scenarios across diverse organisms.展开更多
基金Beijing Natural Science Foundation,Grant/Award Number:5232025Beijing Nova Program,Grant/Award Number:20230484290National Natural Science Foundation of China,Grant/Award Numbers:62173338,61873276。
文摘Copy number variation(CNV)refers to the number of copies of a specific sequence in a genome and is a type of chromatin structural variation.The development of the Hi-C technique has empowered research on the spatial structure of chromatins by capturing interactions between DNA fragments.We utilized machine-learning methods including the linear transformation model and graph convolutional network(GCN)to detect CNV events from Hi-C data and reveal how CNV is related to three-dimensional interactions between genomic fragments in terms of the one-dimensional read count signal and features of the chromatin structure.The experimental results demonstrated a specific linear relation between the Hi-C read count and CNV for each chromosome that can be well qualified by the linear transformation model.In addition,the GCN-based model could accurately extract features of the spatial structure from Hi-C data and infer the corresponding CNV across different chromosomes in a cancer cell line.We performed a series of experiments including dimension reduction,transfer learning,and Hi-C data perturbation to comprehensively evaluate the utility and robustness of the GCN-based model.This work can provide a benchmark for using machine learning to infer CNV from Hi-C data and serves as a necessary foundation for deeper understanding of the relationship between Hi-C data and CNV.
基金supported by the Beijing Nova Program of Science and Technology(No.20220484198 to HC)the National Natural Science Foundation of China(Nos.62173338,61873276 and 31900488 to HC,XB,and HL,respectively).
文摘Background:The hierarchical three-dimensional(3D)architectures of chromatin play an important role in fundamental biological processes,such as cell differentiation,cellular senescence,and transcriptional regulation.Aberrant chromatin 3D structural alterations often present in human diseases and even cancers,but their underlying mechanisms remain unclear.Results:3D chromatin structures(chromatin compartment A/B,topologically associated domains,and enhancerpromoter interactions)play key roles in cancer development,metastasis,and drug resistance.Bioinformatics techniques based on machine learning and deep learning have shown great potential in the study of 3D cancer genome.Conclusion:Current advances in the study of the 3D cancer genome have expanded our understanding of the mechanisms underlying tumorigenesis and development.It will provide new insights into precise diagnosis and personalized treatment for cancers.
基金supported by the Center for Precision Medicine, Sun Yat-sen University and the National High-tech R&D Program (863 Program Grant No. 2015AA020110) of China awarded to YZ
文摘Advances in biological and medical technologies have been providing us explosive vol- umes of biological and physiological data, such as medical images, electroencephalography, geno- mic and protein sequences. Learning from these data facilitates the understanding of human health and disease. Developed from artificial neural networks, deep learning-based algorithms show great promise in extracting features and learning patterns from complex data. The aim of this paper is to provide an overview of deep learning techniques and some of the state-of-the-art applications in the biomedical field. We first introduce the development of artificial neural network and deep learning. We then describe two main components of deep learning, i.e., deep learning architectures and model optimization. Subsequently, some examples are demonstrated for deep learning
基金supported by the National Natural Science Foundation of China(81602620,81700540 and 81770602)the State Key Project for Infectious Diseases(2015ZX09J15107)+1 种基金Shanghai Committee of Science and Technology(15431901600)Translational Application of Precision Medicine of Second Military Medical University(2017JZ52)。
文摘microRNAs(miRNAs),particularly the exosomal miRNAs have been widely used as biomarkers and promising therapeutic targets in cancer.However,a comprehensive analysis of miRNA-gene regulatory network with clinical significance remains scarce.The emergence of high-throughput multi-omics data over large,well-characterized patient cohorts provides an unprecedented opportunity to address this problem.Herein,we performed a clinic-centered analysis to identify cancer-associated miRNAs,miRNA-target axis.We first calculated the correlation among miRNA,mRNA and 75 unique clinico-pathological characteristics(CPCs)in 26 cancer types,and established an online resource(4CR).Interestingly,we found that the high expression of several DNA methylation-related enzymes was associated with adverse outcomes of cancer patients,and these genes were regulated by a cluster of miRNAs.Furthermore,by integrating exosomal miRNA and m RNA databases,we identified exosomal miRNA biomarkers for non-invasive cancer surveillance and therapy monitoring.Finally,we explored the role of CPC-related miRNAs for therapeutic effect prediction of drugs based on their shared targets.Our analysis pipeline illustrated the significance of clinic-centered analysis in miRNA-gene pair identification and provided helpful clues for future cancer studies.
基金supported by the National Natural Science Foundation of China(Grant No.U1435222)the Program of International Sci-Tech Cooperation,China(Grant No.2014DFB30020)。
文摘The accumulation of various types of drug informatics data and computational approaches for drug repositioning can accelerate pharmaceutical research and development.However,the integration of multi-dimensional drug data for precision repositioning remains a pressing challenge.Here,we propose a systematic framework named PIMD to predict drug therapeutic properties by integrating multi-dimensional data for drug repositioning.In PIMD,drug similarity networks(DSNs)based on chemical,pharmacological,and clinical data are fused into an integrated DSN(iDSN)composed of many clusters.Rather than simple fusion,PIMD offers a systematic way to annotate clusters.Unexpected drugs within clusters and drug pairs with a high iDSN similarity score are therefore identified to predict novel therapeutic uses.PIMD provides new insights into the universality,individuality,and complementarity of different drug properties by evaluating the contribution of each property data.To test the performance of PIMD,we use chemical,pharmacological,and clinical properties to generate an iDSN.Analyses of the contributions of each drug property indicate that this iDSN was driven by all data types and performs better than other DSNs.Within the top 20 recommended drug pairs,7 drugs have been reported to be repurposed.The source code for PIMD is available at https://github.com/Sepstar/PIMD/.
基金supported by grants from the National Key R&D Program of China(Grant No.2017YFA0505700)the National Key Lab of Proteomics of China(Grant Nos.SKLP-K201805,SKLP-K201804,and SKLP-Y201703)。
文摘Genome-wide physical protein±protein interaction(PPI)mapping remains a major challenge for current technologies.Here,we reported a high-efficiency BiFC-seq method,yeastenhanced green fluorescent protein-based bimolecular fluorescence complementation(y EGFPBiFC)coupled with next-generation DNA sequencing,for interactome mapping.We first applied y EGFP-BiFC method to systematically investigate an intraviral network of the Ebola virus.Two-thirds(9/14)of known interactions of EBOV were recaptured,and five novel interactions were discovered.Next,we used the BiFC-seq method to map the interactome of the tumor protein p53.We identified 97 interactors of p53,more than three-quarters of which were novel.Furthermore,in a more complex background,we screened potential interactors by pooling two BiFC libraries together and revealed a network of 229 interactions among 205 proteins.These results show that BiFC-seq is a highly sensitive,rapid,and economical method for genome-wide interactome mapping.
基金This work was supported by a startup fund from Southern Medical University.
文摘Functional enrichment analysis is pivotal for interpreting highthroughput omics data in life science.It is crucial for this type of tool to use the latest annotation databases for as many organisms as possible.To meet these requirements,we present here an updated version of our popular Bioconductor package,clusterProfiler 4.0.This package has been enhanced considerably compared with its original version published 9 years ago.The new version provides a universal interface for functional enrichment analysis in thousands of organisms based on internally supported ontologies and pathways as well as annotation data provided by users or derived from online databases.It also extends the dplyr and ggplot2 packages to offer tidy interfaces for data operation and visualization.Other new features include gene set enrichment analysis and comparison of enrichment results from multiple gene lists.We anticipate that clusterProfiler 4.0 will be applied to a wide range of scenarios across diverse organisms.