Microarray data based tumor diagnosis is a very interesting topic in bioinformatics. One of the key problems is the discovery and analysis of informative genes of a tumor. Although there are many elaborate approaches ...Microarray data based tumor diagnosis is a very interesting topic in bioinformatics. One of the key problems is the discovery and analysis of informative genes of a tumor. Although there are many elaborate approaches to this problem, it is still difficult to select a reasonable set of informative genes for tumor diagnosis only with microarray data. In this paper, we classify the genes expressed through microarray data into a number of clusters via the distance sensitive rival penalized competitive learning (DSRPCL) algorithm and then detect the informative gene cluster or set with the help of support vector machine (SVM). Moreover, the critical or powerful informative genes can be found through further classifications and detections on the obtained informative gene clusters. It is well demonstrated by experiments on the colon, leukemia, and breast cancer datasets that our proposed DSRPCL-SVM approach leads to a reasonable selection of informative genes for tumor diagnosis.展开更多
Based on the aspect of historical and cultural geography, the value of "landscape gene information chain" in the tourism development of ancient villages was verified by taking cultural landscape as the princ...Based on the aspect of historical and cultural geography, the value of "landscape gene information chain" in the tourism development of ancient villages was verified by taking cultural landscape as the principal line, historical and cultural settlement as the carriers, and traditional buildings as the entry points. Taking Daqintou Village in Sanshui District, Foshan City for example, the theory of "landscape gene information chain" was applied to design a tourism planning scheme for Daqitou Village.展开更多
Gene expression profiles of 14 common tumors and their counterpart normal tissues were analyzed with machine learning methods to address the problem of selection of tumor-specific genes and analysis of their different...Gene expression profiles of 14 common tumors and their counterpart normal tissues were analyzed with machine learning methods to address the problem of selection of tumor-specific genes and analysis of their differential expressions in tumor tissues.First,a variation of the Relief algorithm,"RFE_Relief algorithm"was proposed to learn the relations between genes and tissue types.Then,a support vector machine was employed to find the gene subset with the best classification performance for distinguishing cancerous tissues and their counterparts.After tissue-specific genes were removed,cross validation experiments were employed to demonstrate the common deregulated expressions of the selected gene in tumor tissues.The results indicate the existence of a specific expression fingerprint of these genes that is shared in different tumor tissues,and the hallmarks of the expression patterns of these genes in cancerous tissues are summarized at the end of this paper.展开更多
Recent advances of single-cell RNA sequencing(scRNA-seq)technologies have led to extensive study of cellular heterogeneity and cell-to-cell variation.However,the high frequency of dropout events and noise in scRNA-seq...Recent advances of single-cell RNA sequencing(scRNA-seq)technologies have led to extensive study of cellular heterogeneity and cell-to-cell variation.However,the high frequency of dropout events and noise in scRNA-seq data confounds the accuracy of the downstream analysis,i.e.clustering analysis,whose accuracy depends heavily on the selected feature genes.Here,by deriving an entropy decomposition formula,we propose a feature selection method,i.e.an intrinsic entropy(IE)model,to identify the informative genes for accurately clustering analysis.Specifically,by eliminating the‘noisy’fluctuation or extrinsic entropy(EE),we extract the IE of each gene from the total entropy(TE),i.e.TE=IE+EE.We show that the IE of each gene actually reflects the regulatory fluctuation of this gene in a cellular process,and thus high-IE genes provide rich information on celltype or state analysis.To validate the performance of the high-IE genes,we conduct computational analysis on both simulated datasets and real single-cell datasets by comparing with other representative methods.The results show that our IE model is not only broadly applicable and robust for different clustering and classification methods,but also sensitive for novel cell types.Our results also demonstrate that the intrinsic entropy/fluctuation of a gene serves as information rather than noise in contrast to its total entropy/fluctuation.展开更多
基金the National Natural Sci-ence Foundation of China (Grant No. 60471054)President Foundation of Peking University.
文摘Microarray data based tumor diagnosis is a very interesting topic in bioinformatics. One of the key problems is the discovery and analysis of informative genes of a tumor. Although there are many elaborate approaches to this problem, it is still difficult to select a reasonable set of informative genes for tumor diagnosis only with microarray data. In this paper, we classify the genes expressed through microarray data into a number of clusters via the distance sensitive rival penalized competitive learning (DSRPCL) algorithm and then detect the informative gene cluster or set with the help of support vector machine (SVM). Moreover, the critical or powerful informative genes can be found through further classifications and detections on the obtained informative gene clusters. It is well demonstrated by experiments on the colon, leukemia, and breast cancer datasets that our proposed DSRPCL-SVM approach leads to a reasonable selection of informative genes for tumor diagnosis.
基金Sponsored by "Twelfth Five-year Plan" Program of Guangdong Provincial Philosophy and Social Sciences(GD15XLS07)
文摘Based on the aspect of historical and cultural geography, the value of "landscape gene information chain" in the tourism development of ancient villages was verified by taking cultural landscape as the principal line, historical and cultural settlement as the carriers, and traditional buildings as the entry points. Taking Daqintou Village in Sanshui District, Foshan City for example, the theory of "landscape gene information chain" was applied to design a tourism planning scheme for Daqitou Village.
基金supported in part by the National Natural Science Foundation of China(Grant No.60234020).
文摘Gene expression profiles of 14 common tumors and their counterpart normal tissues were analyzed with machine learning methods to address the problem of selection of tumor-specific genes and analysis of their differential expressions in tumor tissues.First,a variation of the Relief algorithm,"RFE_Relief algorithm"was proposed to learn the relations between genes and tissue types.Then,a support vector machine was employed to find the gene subset with the best classification performance for distinguishing cancerous tissues and their counterparts.After tissue-specific genes were removed,cross validation experiments were employed to demonstrate the common deregulated expressions of the selected gene in tumor tissues.The results indicate the existence of a specific expression fingerprint of these genes that is shared in different tumor tissues,and the hallmarks of the expression patterns of these genes in cancerous tissues are summarized at the end of this paper.
基金supported by grants from the National Key R&D Program of China(2017YFA0505500)the National Natural Science Foundation of China(31930022,12131020,12026608,and 31771476)+1 种基金the Strategic Priority Research Program of the Chinese Academy of Sciences(XDB38040400)JST Moonshot R&D(JPMJMS2021).
文摘Recent advances of single-cell RNA sequencing(scRNA-seq)technologies have led to extensive study of cellular heterogeneity and cell-to-cell variation.However,the high frequency of dropout events and noise in scRNA-seq data confounds the accuracy of the downstream analysis,i.e.clustering analysis,whose accuracy depends heavily on the selected feature genes.Here,by deriving an entropy decomposition formula,we propose a feature selection method,i.e.an intrinsic entropy(IE)model,to identify the informative genes for accurately clustering analysis.Specifically,by eliminating the‘noisy’fluctuation or extrinsic entropy(EE),we extract the IE of each gene from the total entropy(TE),i.e.TE=IE+EE.We show that the IE of each gene actually reflects the regulatory fluctuation of this gene in a cellular process,and thus high-IE genes provide rich information on celltype or state analysis.To validate the performance of the high-IE genes,we conduct computational analysis on both simulated datasets and real single-cell datasets by comparing with other representative methods.The results show that our IE model is not only broadly applicable and robust for different clustering and classification methods,but also sensitive for novel cell types.Our results also demonstrate that the intrinsic entropy/fluctuation of a gene serves as information rather than noise in contrast to its total entropy/fluctuation.