Deoxyribonucleic acid( DNA) microarray gene expression data has been widely utilized in the field of functional genomics,since it is helpful to study cancer,cells,tissues,organisms etc.But the sample sizes are relat...Deoxyribonucleic acid( DNA) microarray gene expression data has been widely utilized in the field of functional genomics,since it is helpful to study cancer,cells,tissues,organisms etc.But the sample sizes are relatively small compared to the number of genes,so feature selection is very necessary to reduce complexity and increase the classification accuracy of samples. In this paper,a completely newimprovement over particle swarm optimization( PSO) based on fluid mechanics is proposed for the feature selection. This newimprovement simulates the spontaneous process of the air from high pressure to lowpressure,therefore it allows for a search through all possible solution spaces and prevents particles from getting trapped in a local optimum. The experiment shows that,this newimproved algorithm had an elaborate feature simplification which achieved a very precise and significant accuracy in the classification of 8 among the 11 datasets,and it is much better in comparison with other methods for feature selection.展开更多
Microarray gene expression data are analyzed by means of a Bayesian nonparametric model, with emphasis on prediction of future observables, yielding a method for selection of differentially expressed genes and the cor...Microarray gene expression data are analyzed by means of a Bayesian nonparametric model, with emphasis on prediction of future observables, yielding a method for selection of differentially expressed genes and the corresponding classifier.展开更多
Computational analysis is essential for transforming the masses of microarray datainto a mechanistic understanding of cancer. Here we present a method for findinggene functional modules of cancer from microarray data ...Computational analysis is essential for transforming the masses of microarray datainto a mechanistic understanding of cancer. Here we present a method for findinggene functional modules of cancer from microarray data and have applied it tocolon cancer. First, a colon cancer gene network and a normal colon tissue genenetwork were constructed using correlations between the genes. Then the modulesthat tended to have a homogeneous functional composition were identified by split-ting up the network. Analysis of both networks revealed that they are scale-free.Comparison of the gene functional modules for colon cancer and normal tissuesshowed that the modules’ functions changed with their structures.展开更多
In bioinformatics applications,examination of microarray data has received significant interest to diagnose diseases.Microarray gene expression data can be defined by a massive searching space that poses a primary cha...In bioinformatics applications,examination of microarray data has received significant interest to diagnose diseases.Microarray gene expression data can be defined by a massive searching space that poses a primary challenge in the appropriate selection of genes.Microarray data classification incorporates multiple disciplines such as bioinformatics,machine learning(ML),data science,and pattern classification.This paper designs an optimal deep neural network based microarray gene expression classification(ODNN-MGEC)model for bioinformatics applications.The proposed ODNN-MGEC technique performs data normalization process to normalize the data into a uniform scale.Besides,improved fruit fly optimization(IFFO)based feature selection technique is used to reduce the high dimensionality in the biomedical data.Moreover,deep neural network(DNN)model is applied for the classification of microarray gene expression data and the hyperparameter tuning of the DNN model is carried out using the Symbiotic Organisms Search(SOS)algorithm.The utilization of IFFO and SOS algorithms pave the way for accomplishing maximum gene expression classification outcomes.For examining the improved outcomes of the ODNN-MGEC technique,a wide ranging experimental analysis is made against benchmark datasets.The extensive comparison study with recent approaches demonstrates the enhanced outcomes of the ODNN-MGEC technique in terms of different measures.展开更多
BACKGROUND Burkitt lymphoma(BL)is an exceptionally aggressive malignant neoplasm that arises from either the germinal center or post-germinal center B cells.Patients with BL often present with rapid tumor growth and r...BACKGROUND Burkitt lymphoma(BL)is an exceptionally aggressive malignant neoplasm that arises from either the germinal center or post-germinal center B cells.Patients with BL often present with rapid tumor growth and require high-intensity multidrug therapy combined with adequate intrathecal chemotherapy prophylaxis,however,a standard treatment program for BL has not yet been established.It is important to identify biomarkers for predicting the prognosis of BLs and discriminating patients who might benefit from the therapy.Microarray data and sequencing information from public databases could offer opportunities for the discovery of new diagnostic or therapeutic targets.AIM To identify hub genes and perform gene ontology(GO)and survival analysis in BL.METHODS Gene expression profiles and clinical traits of BL patients were collected from the Gene Expression Omnibus database.Weighted gene co-expression network analysis(WGCNA)was applied to construct gene co-expression modules,and the cytoHubba tool was used to find the hub genes.Then,the hub genes were analyzed using GO and Kyoto Encyclopedia of Genes and Genomes analysis.Additionally,a Protein-Protein Interaction network and a Genetic Interaction network were constructed.Prognostic candidate genes were identified through overall survival analysis.Finally,a nomogram was established to assess the predictive value of hub genes,and drug-gene interactions were also constructed.RESULTS In this study,we obtained 8 modules through WGCNA analysis,and there was a significant correlation between the yellow module and age.Then we identified 10 hub genes(SRC,TLR4,CD40,STAT3,SELL,CXCL10,IL2RA,IL10RA,CCR7 and FCGR2B)by cytoHubba tool.Within these hubs,two genes were found to be associated with OS(CXCL10,P=0.029 and IL2RA,P=0.0066)by survival analysis.Additionally,we combined these two hub genes and age to build a nomogram.Moreover,the drugs related to IL2RA and CXCL10 might have a potential therapeutic role in relapsed and refractory BL.CONCLUSION From WGCNA and survival analysis,we identified CXCL10 and IL2RA that might be prognostic markers for BL.展开更多
AIM: To identify and understand the relationship between co-expression pattern and clinic traits in uveal melanoma, weighted gene co-expression network analysis(WGCNA) is applied to investigate the gene expression lev...AIM: To identify and understand the relationship between co-expression pattern and clinic traits in uveal melanoma, weighted gene co-expression network analysis(WGCNA) is applied to investigate the gene expression levels and patient clinic features. Uveal melanoma is the most common primary eye tumor in adults. Although many studies have identified some important genes and pathways that were relevant to progress of uveal melanoma, the relationship between co-expression and clinic traits in systems level of uveal melanoma is unclear yet. We employ WGCNA to investigate the relationship underlying molecular and phenotype in this study.METHODS: Gene expression profile of uveal melanoma and patient clinic traits were collected from the Gene Expression Omnibus(GEO) database. The gene co-expression is calculated by WGCNA that is the R package software. The package is used to analyze the correlation between pairs of expression levels of genes.The function of the genes were annotated by gene ontology(GO).RESULTS: In this study, we identified four co-expression modules significantly correlated with clinictraits. Module blue positively correlated with radiotherapy treatment. Module purple positively correlates with tumor location(sclera) and negatively correlates with patient age. Module red positively correlates with sclera and negatively correlates with thickness of tumor. Module black positively correlates with the largest tumor diameter(LTD). Additionally, we identified the hug gene(top connectivity with other genes) in each module. The hub gene RPS15 A, PTGDS, CD53 and MSI2 might play a vital role in progress of uveal melanoma.CONCLUSION: From WGCNA analysis and hub gene calculation, we identified RPS15 A, PTGDS, CD53 and MSI2 might be target or diagnosis for uveal melanoma.展开更多
Clustering is perhaps one of the most widely used tools for microarray data analysis. Proposed roles for genes of unknown function are inferred from clusters of genes similarity expressed across many biological condit...Clustering is perhaps one of the most widely used tools for microarray data analysis. Proposed roles for genes of unknown function are inferred from clusters of genes similarity expressed across many biological conditions. However, whether function annotation by similarity metrics is reliable or not and to what extent the similarity in gene expression patterns is useful for annotation of gene functions, has not been evaluated. This paper made a comprehensive research on the correlation between the similarity of expression data and of gene functions using Gene Ontology. It has been found that although the similarity in expression patterns and the similarity in gene functions are significantly dependent on each other, this association is rather weak. In addition, among the three categories of Gene Ontology, the similarity of expression data is more useful for cellular component annotation than for biological process and molecular function. The results presented are interesting for the gene functions prediction research area.展开更多
Microarray data are often extremely asymmetric in dimensionality, such as thousands or even tens of thousands of genes but only a few hundreds of samples or less. Such extreme asymmetry between the dimensionality of g...Microarray data are often extremely asymmetric in dimensionality, such as thousands or even tens of thousands of genes but only a few hundreds of samples or less. Such extreme asymmetry between the dimensionality of genes and samples can lead to inaccurate diagnosis of disease in clinic. Therefore, it has been shown that selecting a small set of marker genes can lead to improved classification accuracy. In this paper, a simple modified ant colony optimization (ACO) algorithm is proposed to select tumorelated marker genes, and support vector machine (SVM) is used as classifier to evaluate the performance of the extracted gene subset. Experimental results on several benchmark tumor microarray datasets showed that the proposed approach produces better recognition with fewer marker genes than many other methods. It has been demonstrated that the modified ACO is a useful tool for selecting marker genes and mining high dimension data展开更多
As explored by biologists, there is a real and emerging need to identify co-regulated gene clusters, which include both positive and negative regulated gene clusters. However, the existing pattern-based and tendency-b...As explored by biologists, there is a real and emerging need to identify co-regulated gene clusters, which include both positive and negative regulated gene clusters. However, the existing pattern-based and tendency-based clustering approaches are only designed for finding positive regulated gene clusters. In this paper, a new subspace clustering model called g-Cluster is proposed for gene expression data. The proposed model has the following advantages: 1) find both positive and negative co-regulated genes in a shot, 2) get away from the restriction of magnitude transformation relationship among co-regulated genes, and 3) guarantee quality of clusters and significance of regulations using a novel similarity measurement gCode and a user-specified regulation threshold δ, respectively. No previous work measures up to the task which has been set. Moreover, MDL technique is introduced to avoid insignificant g-Clusters generated. A tree structure, namely GS-tree, is also designed, and two algorithms combined with efficient pruning and optimization strategies to identify all qualified g-Clusters. Extensive experiments are conducted on real and synthetic datasets. The experimental results show that 1) the algorithm is able to find an amount of co-regulated gene clusters missed by previous models, which are potentially of high biological significance, and 2) the algorithms are effective and efficient, and outperform the existing approaches.展开更多
Microarray data based tumor diagnosis is a very interesting topic in bioinformatics. One of the key problems is the discovery and analysis of informative genes of a tumor. Although there are many elaborate approaches ...Microarray data based tumor diagnosis is a very interesting topic in bioinformatics. One of the key problems is the discovery and analysis of informative genes of a tumor. Although there are many elaborate approaches to this problem, it is still difficult to select a reasonable set of informative genes for tumor diagnosis only with microarray data. In this paper, we classify the genes expressed through microarray data into a number of clusters via the distance sensitive rival penalized competitive learning (DSRPCL) algorithm and then detect the informative gene cluster or set with the help of support vector machine (SVM). Moreover, the critical or powerful informative genes can be found through further classifications and detections on the obtained informative gene clusters. It is well demonstrated by experiments on the colon, leukemia, and breast cancer datasets that our proposed DSRPCL-SVM approach leads to a reasonable selection of informative genes for tumor diagnosis.展开更多
基金Supported by the National Natural Science Foundation of China(61472161,61402195,61502198)
文摘Deoxyribonucleic acid( DNA) microarray gene expression data has been widely utilized in the field of functional genomics,since it is helpful to study cancer,cells,tissues,organisms etc.But the sample sizes are relatively small compared to the number of genes,so feature selection is very necessary to reduce complexity and increase the classification accuracy of samples. In this paper,a completely newimprovement over particle swarm optimization( PSO) based on fluid mechanics is proposed for the feature selection. This newimprovement simulates the spontaneous process of the air from high pressure to lowpressure,therefore it allows for a search through all possible solution spaces and prevents particles from getting trapped in a local optimum. The experiment shows that,this newimproved algorithm had an elaborate feature simplification which achieved a very precise and significant accuracy in the classification of 8 among the 11 datasets,and it is much better in comparison with other methods for feature selection.
文摘Microarray gene expression data are analyzed by means of a Bayesian nonparametric model, with emphasis on prediction of future observables, yielding a method for selection of differentially expressed genes and the corresponding classifier.
基金the National Natural Science Foundation of China (Grant No. 60234020).
文摘Computational analysis is essential for transforming the masses of microarray datainto a mechanistic understanding of cancer. Here we present a method for findinggene functional modules of cancer from microarray data and have applied it tocolon cancer. First, a colon cancer gene network and a normal colon tissue genenetwork were constructed using correlations between the genes. Then the modulesthat tended to have a homogeneous functional composition were identified by split-ting up the network. Analysis of both networks revealed that they are scale-free.Comparison of the gene functional modules for colon cancer and normal tissuesshowed that the modules’ functions changed with their structures.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under grant number(RGP 2/42/43)This work was supported by Taif University Researchers Supporting Program(project number:TURSP-2020/200),Taif University,Saudi Arabia.
文摘In bioinformatics applications,examination of microarray data has received significant interest to diagnose diseases.Microarray gene expression data can be defined by a massive searching space that poses a primary challenge in the appropriate selection of genes.Microarray data classification incorporates multiple disciplines such as bioinformatics,machine learning(ML),data science,and pattern classification.This paper designs an optimal deep neural network based microarray gene expression classification(ODNN-MGEC)model for bioinformatics applications.The proposed ODNN-MGEC technique performs data normalization process to normalize the data into a uniform scale.Besides,improved fruit fly optimization(IFFO)based feature selection technique is used to reduce the high dimensionality in the biomedical data.Moreover,deep neural network(DNN)model is applied for the classification of microarray gene expression data and the hyperparameter tuning of the DNN model is carried out using the Symbiotic Organisms Search(SOS)algorithm.The utilization of IFFO and SOS algorithms pave the way for accomplishing maximum gene expression classification outcomes.For examining the improved outcomes of the ODNN-MGEC technique,a wide ranging experimental analysis is made against benchmark datasets.The extensive comparison study with recent approaches demonstrates the enhanced outcomes of the ODNN-MGEC technique in terms of different measures.
文摘BACKGROUND Burkitt lymphoma(BL)is an exceptionally aggressive malignant neoplasm that arises from either the germinal center or post-germinal center B cells.Patients with BL often present with rapid tumor growth and require high-intensity multidrug therapy combined with adequate intrathecal chemotherapy prophylaxis,however,a standard treatment program for BL has not yet been established.It is important to identify biomarkers for predicting the prognosis of BLs and discriminating patients who might benefit from the therapy.Microarray data and sequencing information from public databases could offer opportunities for the discovery of new diagnostic or therapeutic targets.AIM To identify hub genes and perform gene ontology(GO)and survival analysis in BL.METHODS Gene expression profiles and clinical traits of BL patients were collected from the Gene Expression Omnibus database.Weighted gene co-expression network analysis(WGCNA)was applied to construct gene co-expression modules,and the cytoHubba tool was used to find the hub genes.Then,the hub genes were analyzed using GO and Kyoto Encyclopedia of Genes and Genomes analysis.Additionally,a Protein-Protein Interaction network and a Genetic Interaction network were constructed.Prognostic candidate genes were identified through overall survival analysis.Finally,a nomogram was established to assess the predictive value of hub genes,and drug-gene interactions were also constructed.RESULTS In this study,we obtained 8 modules through WGCNA analysis,and there was a significant correlation between the yellow module and age.Then we identified 10 hub genes(SRC,TLR4,CD40,STAT3,SELL,CXCL10,IL2RA,IL10RA,CCR7 and FCGR2B)by cytoHubba tool.Within these hubs,two genes were found to be associated with OS(CXCL10,P=0.029 and IL2RA,P=0.0066)by survival analysis.Additionally,we combined these two hub genes and age to build a nomogram.Moreover,the drugs related to IL2RA and CXCL10 might have a potential therapeutic role in relapsed and refractory BL.CONCLUSION From WGCNA and survival analysis,we identified CXCL10 and IL2RA that might be prognostic markers for BL.
基金Supported by the National Natural Science Foundation of China(No.81271019No.61463046)Gansu Province Science Foundation for Youths(No.145RJYA282)
文摘AIM: To identify and understand the relationship between co-expression pattern and clinic traits in uveal melanoma, weighted gene co-expression network analysis(WGCNA) is applied to investigate the gene expression levels and patient clinic features. Uveal melanoma is the most common primary eye tumor in adults. Although many studies have identified some important genes and pathways that were relevant to progress of uveal melanoma, the relationship between co-expression and clinic traits in systems level of uveal melanoma is unclear yet. We employ WGCNA to investigate the relationship underlying molecular and phenotype in this study.METHODS: Gene expression profile of uveal melanoma and patient clinic traits were collected from the Gene Expression Omnibus(GEO) database. The gene co-expression is calculated by WGCNA that is the R package software. The package is used to analyze the correlation between pairs of expression levels of genes.The function of the genes were annotated by gene ontology(GO).RESULTS: In this study, we identified four co-expression modules significantly correlated with clinictraits. Module blue positively correlated with radiotherapy treatment. Module purple positively correlates with tumor location(sclera) and negatively correlates with patient age. Module red positively correlates with sclera and negatively correlates with thickness of tumor. Module black positively correlates with the largest tumor diameter(LTD). Additionally, we identified the hug gene(top connectivity with other genes) in each module. The hub gene RPS15 A, PTGDS, CD53 and MSI2 might play a vital role in progress of uveal melanoma.CONCLUSION: From WGCNA analysis and hub gene calculation, we identified RPS15 A, PTGDS, CD53 and MSI2 might be target or diagnosis for uveal melanoma.
基金Project supported by the Key Program of Basic Research of Science & Technology Commission of Shanghai Municipality (No. 04dz14004) and the Shanghai Natural Science Foundation (No. 03ZR14065). Dedicated to Professor Xikui Jiang on the occasion of his 80th birthday.
文摘Clustering is perhaps one of the most widely used tools for microarray data analysis. Proposed roles for genes of unknown function are inferred from clusters of genes similarity expressed across many biological conditions. However, whether function annotation by similarity metrics is reliable or not and to what extent the similarity in gene expression patterns is useful for annotation of gene functions, has not been evaluated. This paper made a comprehensive research on the correlation between the similarity of expression data and of gene functions using Gene Ontology. It has been found that although the similarity in expression patterns and the similarity in gene functions are significantly dependent on each other, this association is rather weak. In addition, among the three categories of Gene Ontology, the similarity of expression data is more useful for cellular component annotation than for biological process and molecular function. The results presented are interesting for the gene functions prediction research area.
基金partially supported by National Natural Science Foundation of China (Grant No.60873036)China Postdoctoral Science Foundation(Grant No. 20060400809) Science and Technology Special Foundation for Young Researchers of Hei-longjiang Province of China (Grant No. QC06C022)
文摘Microarray data are often extremely asymmetric in dimensionality, such as thousands or even tens of thousands of genes but only a few hundreds of samples or less. Such extreme asymmetry between the dimensionality of genes and samples can lead to inaccurate diagnosis of disease in clinic. Therefore, it has been shown that selecting a small set of marker genes can lead to improved classification accuracy. In this paper, a simple modified ant colony optimization (ACO) algorithm is proposed to select tumorelated marker genes, and support vector machine (SVM) is used as classifier to evaluate the performance of the extracted gene subset. Experimental results on several benchmark tumor microarray datasets showed that the proposed approach produces better recognition with fewer marker genes than many other methods. It has been demonstrated that the modified ACO is a useful tool for selecting marker genes and mining high dimension data
基金This work is supported by the National Grand Fundamental Research 973 Program of China (Grant No. 2006CB303103) and the National Natural Science Foundation of China under Grants No. 60573089, No. 60273079 and No. 60473074.
文摘As explored by biologists, there is a real and emerging need to identify co-regulated gene clusters, which include both positive and negative regulated gene clusters. However, the existing pattern-based and tendency-based clustering approaches are only designed for finding positive regulated gene clusters. In this paper, a new subspace clustering model called g-Cluster is proposed for gene expression data. The proposed model has the following advantages: 1) find both positive and negative co-regulated genes in a shot, 2) get away from the restriction of magnitude transformation relationship among co-regulated genes, and 3) guarantee quality of clusters and significance of regulations using a novel similarity measurement gCode and a user-specified regulation threshold δ, respectively. No previous work measures up to the task which has been set. Moreover, MDL technique is introduced to avoid insignificant g-Clusters generated. A tree structure, namely GS-tree, is also designed, and two algorithms combined with efficient pruning and optimization strategies to identify all qualified g-Clusters. Extensive experiments are conducted on real and synthetic datasets. The experimental results show that 1) the algorithm is able to find an amount of co-regulated gene clusters missed by previous models, which are potentially of high biological significance, and 2) the algorithms are effective and efficient, and outperform the existing approaches.
基金the National Natural Sci-ence Foundation of China (Grant No. 60471054)President Foundation of Peking University.
文摘Microarray data based tumor diagnosis is a very interesting topic in bioinformatics. One of the key problems is the discovery and analysis of informative genes of a tumor. Although there are many elaborate approaches to this problem, it is still difficult to select a reasonable set of informative genes for tumor diagnosis only with microarray data. In this paper, we classify the genes expressed through microarray data into a number of clusters via the distance sensitive rival penalized competitive learning (DSRPCL) algorithm and then detect the informative gene cluster or set with the help of support vector machine (SVM). Moreover, the critical or powerful informative genes can be found through further classifications and detections on the obtained informative gene clusters. It is well demonstrated by experiments on the colon, leukemia, and breast cancer datasets that our proposed DSRPCL-SVM approach leads to a reasonable selection of informative genes for tumor diagnosis.