Gene selection is an indispensable step for analyzing noisy and high-dimensional single-cell RNA-seq(scRNA-seq)data.Compared with the commonly used variance-based methods,by mimicking the human maker selection in the ...Gene selection is an indispensable step for analyzing noisy and high-dimensional single-cell RNA-seq(scRNA-seq)data.Compared with the commonly used variance-based methods,by mimicking the human maker selection in the 2D visualization of cells,a new feature selection method called HRG(Highly Regional Genes)is proposed to find the informative genes,which show regional expression patterns in the cell-cell similarity network.We mathematically find the optimal expression patterns that can maximize the proposed scoring function.In comparison with several unsupervised methods,HRG shows high accuracy and robustness,and can increase the performance of downstream cell clustering and gene correlation analysis.Also,it is applicable for selecting informative genes of sequencing-based spatial transcriptomic data.展开更多
In this commentary, I explain my perspective on the relationship between arti cial intelligence (AI)/data science and biomedicine from a long-range retrospective view. The development of modern biomedicine has always ...In this commentary, I explain my perspective on the relationship between arti cial intelligence (AI)/data science and biomedicine from a long-range retrospective view. The development of modern biomedicine has always been accelerated by the repeated emergence of new technologies. Since all life systems are basically governed by the information in their own DNA, information science has special importance for the study of biomedicine. Unlike in physics, no (or very few) leading laws have been found in biology. Thus, in biology, the “data-to-knowledge” approach is important. AI has historically been applied to bio- medicine, and the recent news that an AI-based approach achieved the best performance in an interna- tional competition of protein structure prediction may be regarded as another landmark in the eld. Similar approaches could contribute to solving problems in genome sequence interpretation, such as identifying cancer-driving mutations in the genome of patients. Recently, the explosive development of next-generation sequencing (NGS) has been producing massive data, and this trend will accelerate. NGS is not only used for “reading” DNA sequences, but also for obtaining various types of information at the single-cell level. These data can be regarded as grid data points in climate simulation. Both data science and AI will become essential for the integrative interpretation/simulation of these data, and will take a leading role in future precision medicine.展开更多
Gene co-expression network(GCN)mining identifies gene modules with highly correlated expression profiles across samples/conditions.It enables researchers to discover latent gene/molecule interactions,identify novel ge...Gene co-expression network(GCN)mining identifies gene modules with highly correlated expression profiles across samples/conditions.It enables researchers to discover latent gene/molecule interactions,identify novel gene functions,and extract molecular features from certain disease/condition groups,thus helping to identify disease bio-markers.However,there lacks an easy-to-use tool package for users to mine GCN modules that are relatively small in size with tightly connected genes that can be convenient for downstream gene set enrichment analysis,as well as modules that may share common members.To address this need,we developed an online GCN mining tool package:TSUNAMI(Tools SUite for Network Analysis and MIning).TSUNAMI incorporates our state-of-the-art lmQCM algorithm to mine GCN modules for both public and user-input data(microarray,RNA-seq,or any other numerical omics data),and then performs downstream gene set enrichment analysis for the identified modules.It has several features and advantages:1)a user-friendly interface and real-time co-expression network mining through a web server;2)direct access and search of NCBI Gene Expression Omnibus(GEO)and The Cancer Genome Atlas(TCGA)databases,as well as user-input gene ex-pression matrices for GCN module mining;3)multiple co-expression analysis tools to choose from,all of which are highly flexible in regards to parameter selection options;4)identified GCN modules are summarized to eigengenes,which are convenient for users to check their correlation with other clinical traits;5)integrated downstream Enrichr enrichment analysis and links to other gene set enrichment tools;and 6)visualization of gene loci by Circos plot in any step of the process.The web service is freely accessible through URL:https://biolearns.medicine.iu.edu/.Source code is available at https://github.com/huangzhii/TSUNAMI/.展开更多
基金supported by the National Key Research and Development Program(2020YFA0712403,2020YFA0906900)National Natural Science Foundation of China(61922047,81890993,61721003,62133006)BNRIST Young Innovation Fund(BNR2020RC01009)。
文摘Gene selection is an indispensable step for analyzing noisy and high-dimensional single-cell RNA-seq(scRNA-seq)data.Compared with the commonly used variance-based methods,by mimicking the human maker selection in the 2D visualization of cells,a new feature selection method called HRG(Highly Regional Genes)is proposed to find the informative genes,which show regional expression patterns in the cell-cell similarity network.We mathematically find the optimal expression patterns that can maximize the proposed scoring function.In comparison with several unsupervised methods,HRG shows high accuracy and robustness,and can increase the performance of downstream cell clustering and gene correlation analysis.Also,it is applicable for selecting informative genes of sequencing-based spatial transcriptomic data.
文摘In this commentary, I explain my perspective on the relationship between arti cial intelligence (AI)/data science and biomedicine from a long-range retrospective view. The development of modern biomedicine has always been accelerated by the repeated emergence of new technologies. Since all life systems are basically governed by the information in their own DNA, information science has special importance for the study of biomedicine. Unlike in physics, no (or very few) leading laws have been found in biology. Thus, in biology, the “data-to-knowledge” approach is important. AI has historically been applied to bio- medicine, and the recent news that an AI-based approach achieved the best performance in an interna- tional competition of protein structure prediction may be regarded as another landmark in the eld. Similar approaches could contribute to solving problems in genome sequence interpretation, such as identifying cancer-driving mutations in the genome of patients. Recently, the explosive development of next-generation sequencing (NGS) has been producing massive data, and this trend will accelerate. NGS is not only used for “reading” DNA sequences, but also for obtaining various types of information at the single-cell level. These data can be regarded as grid data points in climate simulation. Both data science and AI will become essential for the integrative interpretation/simulation of these data, and will take a leading role in future precision medicine.
基金supported by the American Cancer Society Inernal Reseatch Grant (to JZ)the National Cancer Institure Informatics Technology for Ccance Research U01 grant (Grant No. CA188547 to JZ and KH)+1 种基金the Indiana University Precision Health Initiative (to JZ and KH)the support from Indiana University Information Technologies and Advanced Biomedical IT Core
文摘Gene co-expression network(GCN)mining identifies gene modules with highly correlated expression profiles across samples/conditions.It enables researchers to discover latent gene/molecule interactions,identify novel gene functions,and extract molecular features from certain disease/condition groups,thus helping to identify disease bio-markers.However,there lacks an easy-to-use tool package for users to mine GCN modules that are relatively small in size with tightly connected genes that can be convenient for downstream gene set enrichment analysis,as well as modules that may share common members.To address this need,we developed an online GCN mining tool package:TSUNAMI(Tools SUite for Network Analysis and MIning).TSUNAMI incorporates our state-of-the-art lmQCM algorithm to mine GCN modules for both public and user-input data(microarray,RNA-seq,or any other numerical omics data),and then performs downstream gene set enrichment analysis for the identified modules.It has several features and advantages:1)a user-friendly interface and real-time co-expression network mining through a web server;2)direct access and search of NCBI Gene Expression Omnibus(GEO)and The Cancer Genome Atlas(TCGA)databases,as well as user-input gene ex-pression matrices for GCN module mining;3)multiple co-expression analysis tools to choose from,all of which are highly flexible in regards to parameter selection options;4)identified GCN modules are summarized to eigengenes,which are convenient for users to check their correlation with other clinical traits;5)integrated downstream Enrichr enrichment analysis and links to other gene set enrichment tools;and 6)visualization of gene loci by Circos plot in any step of the process.The web service is freely accessible through URL:https://biolearns.medicine.iu.edu/.Source code is available at https://github.com/huangzhii/TSUNAMI/.