Lung cancer remains a significant global health challenge and identifying lung cancer at an early stage is essential for enhancing patient outcomes. The study focuses on developing and optimizing gene expression-based...Lung cancer remains a significant global health challenge and identifying lung cancer at an early stage is essential for enhancing patient outcomes. The study focuses on developing and optimizing gene expression-based models for classifying cancer types using machine learning techniques. By applying Log2 normalization to gene expression data and conducting Wilcoxon rank sum tests, the researchers employed various classifiers and Incremental Feature Selection (IFS) strategies. The study culminated in two optimized models using the XGBoost classifier, comprising 10 and 74 genes respectively. The 10-gene model, due to its simplicity, is proposed for easier clinical implementation, whereas the 74-gene model exhibited superior performance in terms of Specificity, AUC (Area Under the Curve), and Precision. These models were evaluated based on their sensitivity, AUC, and specificity, aiming to achieve high sensitivity and AUC while maintaining reasonable specificity.展开更多
The continued expansion of the world population,increasingly inconsistent climate and shrinking agricultural resources present major challenges to crop breeding.Fortunately,the increasing ability to discover and manip...The continued expansion of the world population,increasingly inconsistent climate and shrinking agricultural resources present major challenges to crop breeding.Fortunately,the increasing ability to discover and manipulate genes creates new opportunities to develop more productive and resilient cultivars.Many genes have been described in papers as being beneficial for yield increase.However,few of them have been translated into increased yield on farms.In contrast,commercial breeders are facing gene decidophobia,i.e.,puzzled about which gene to choose for breeding among the many identified,a huge chasm between gene discovery and cultivar innovation.The purpose of this paper is to draw attention to the shortfalls in current gene discovery research and to emphasise the need to align with cultivar innovation.The methodology dictates that genetic studies not only focus on gene discovery but also pay good attention to the genetic backgrounds,experimental validation in relevant environments,appropriate crop management,and data reusability.The close of the gaps should accelerate the application of molecular study in breeding and contribute to future global food security.展开更多
Plant morphogenesis relies on precise gene expression programs at the proper time and position which is orchestrated by transcription factors(TFs)in intricate regulatory networks in a cell-type specific manner.Here we...Plant morphogenesis relies on precise gene expression programs at the proper time and position which is orchestrated by transcription factors(TFs)in intricate regulatory networks in a cell-type specific manner.Here we introduced a comprehensive single-cell transcriptomic atlas of Arabidopsis seedlings.This atlas is the result of meticulous integration of 63 previously published scRNA-seq datasets,addressing batch effects and conserving biological variance.This integration spans a broad spectrum of tissues,including both below-and above-ground parts.Utilizing a rigorous approach for cell type annotation,we identified 47 distinct cell types or states,largely expanding our current view of plant cell compositions.We systematically constructed cell-type specific gene regulatory networks and uncovered key regulators that act in a coordinated manner to control cell-type specific gene expression.Taken together,our study not only offers extensive plant cell atlas exploration that serves as a valuable resource,but also provides molecular insights into gene-regulatory programs that varies from different cell types.展开更多
Background:Systemic lupus erythematosus(SLE)is a complex chronic autoimmune disease with no known cure.However,the regulatory mechanism of immunity-related genes is not fully understood in SLE.In order to explore new ...Background:Systemic lupus erythematosus(SLE)is a complex chronic autoimmune disease with no known cure.However,the regulatory mechanism of immunity-related genes is not fully understood in SLE.In order to explore new therapeutic targets,we used bioinformatical methods to analyze a series of data.Methods:After downloading and processing the data from Gene Expression Omnibus database,the differentially expressed genes of SLE were analyzed.CIBERSORT algorithm was used to analyze the immune infiltration of SLE.Based on single-cell RNA-sequencing data,the role of immune-related genes in SLE and its target organ(kidney)were analyzed.Key transcription factors affecting immune-related genes were identified.Cell-cell communication networks in SLE were analyzed.Results:In total,15 hub genes and 4 transcription factors were found in the bulk data.Monocytes and macrophages in GSE81622(SLE)showed more infiltration.There were four cell types were annotated in scRNA sequencing dataset(GSE135779),as follows T cells,monocyte,NK cells and B cells.Immunity-related genes were overexpressed in monocytes.Conclusion:The present study shows that immune-related genes affect SLE through monocytes and play an important role in target organ renal injury.展开更多
A Schwann cell has regenerative capabilities and is an important cell in the peripheral nervous system.This microarray study is part of a bioinformatics study that focuses mainly on Schwann cells. Microarray data prov...A Schwann cell has regenerative capabilities and is an important cell in the peripheral nervous system.This microarray study is part of a bioinformatics study that focuses mainly on Schwann cells. Microarray data provide information on differences between microarray-based and experiment-based gene expression analyses. According to microarray data, several genes exhibit increased expression(fold change) but they are weakly expressed in experimental studies(based on morphology, protein and mRNA levels). In contrast, some genes are weakly expressed in microarray data and highly expressed in experimental studies;such genes may represent future target genes in Schwann cell studies. These studies allow us to learn about additional genes that could be used to achieve targeted results from experimental studies. In the current big data study by retrieving more than 5000 scientific articles from PubMed or NCBI, Google Scholar, and Google, 1016(up-and downregulated) genes were determined to be related to Schwann cells. However,no experiment was performed in the laboratory; rather, the present study is part of a big data analysis. Our study will contribute to our understanding of Schwann cell biology by aiding in the identification of genes.Based on a comparative analysis of all microarray data, we conclude that the microarray could be a good tool for predicting the expression and intensity of different genes of interest in actual experiments.展开更多
Metastasis is the greatest contributor to cancer?related death.In the era of precision medicine,it is essential to predict and to prevent the spread of cancer cells to significantly improve patient survival.Thanks to ...Metastasis is the greatest contributor to cancer?related death.In the era of precision medicine,it is essential to predict and to prevent the spread of cancer cells to significantly improve patient survival.Thanks to the application of a variety of high?throughput technologies,accumulating big data enables researchers and clinicians to identify aggressive tumors as well as patients with a high risk of cancer metastasis.However,there have been few large?scale gene collection studies to enable metastasis?related analyses.In the last several years,emerging efforts have identi?fied pro?metastatic genes in a variety of cancers,providing us the ability to generate a pro?metastatic gene cluster for big data analyses.We carefully selected 285 genes with in vivo evidence of promoting metastasis reported in the literature.These genes have been investigated in different tumor types.We used two datasets downloaded from The Cancer Genome Atlas database,specifically,datasets of clear cell renal cell carcinoma and hepatocellular carcinoma,for validation tests,and excluded any genes for which elevated expression level correlated with longer overall survival in any of the datasets.Ultimately,150 pro?metastatic genes remained in our analyses.We believe this collection of pro?metastatic genes will be helpful for big data analyses,and eventually will accelerate anti?metastasis research and clinical intervention.展开更多
Background:Meta-analysis of quantitative trait locus(QTL)is a computational technique to identify consensus QTL and refine QTL positions on the consensus map from multiple mapping studies.The combination of meta-QTL i...Background:Meta-analysis of quantitative trait locus(QTL)is a computational technique to identify consensus QTL and refine QTL positions on the consensus map from multiple mapping studies.The combination of meta-QTL intervals,significant SNPs and transcriptome analysis has been widely used to identify candidate genes in various plants.Results:In our study,884 QTLs associated with cotton fiber quality traits from 12 studies were used for meta-QTL analysis based on reference genome TM-1,as a result,74 meta-QTLs were identified,including 19 meta-QTLs for fiber length;18 meta-QTLs for fiber strength;11 meta-QTLs for fiber uniformity;11 meta-QTLs for fiber elongation;and 15 meta-QTLs for micronaire.Combined with 8589 significant single nucleotide polymorphisms associated with fiber quality traits collected from 15 studies,297 candidate genes were identified in the meta-QTL intervals,20 of which showed high expression levels specifically in the developing fibers.According to the function annotations,some of the 20 key candidate genes are associated with the fiber development.Conclusions:This study provides not only stable QTLs used for marker-assisted selection,but also candidate genes to uncover the molecular mechanisms for cotton fiber development.展开更多
In this paper, a similarity measure between genes with protein-protein interactions is pro-posed. The chip-chip data are converted into the same form of gene expression data with pear-son correlation as its similarity...In this paper, a similarity measure between genes with protein-protein interactions is pro-posed. The chip-chip data are converted into the same form of gene expression data with pear-son correlation as its similarity measure. On the basis of the similarity measures of protein- protein interaction data and chip-chip data, the combined dissimilarity measure is defined. The combined distance measure is introduced into K-means method, which can be considered as an improved K-means method. The improved K-means method and other three clustering methods are evaluated by a real dataset. Per-formance of these methods is assessed by a prediction accuracy analysis through known gene annotations. Our results show that the improved K-means method outperforms other clustering methods. The performance of the improved K-means method is also tested by varying the tuning coefficients of the combined dissimilarity measure. The results show that it is very helpful and meaningful to incorporate het-erogeneous data sources in clustering gene expression data, and those coefficients for the genome-wide or completed data sources should be given larger values when constructing the combined dissimilarity measure.展开更多
Gibberellins are an important class of plant hormones.They play an important regulatory role in all stages of growth and development of higher plants.The use of mutants to study gibberellin metabolism and signal trans...Gibberellins are an important class of plant hormones.They play an important regulatory role in all stages of growth and development of higher plants.The use of mutants to study gibberellin metabolism and signal transduction pathways is currently a research hotspot.This article takes the data of Affymetrix chips of rice as an example,bioinformatics method was used to study rice SLR1 mutant and mine differentially expressed wild-type genes,thus exploring the expression regulation network of gibberellin signaling pathway-related genes.展开更多
In bioinformatics applications,examination of microarray data has received significant interest to diagnose diseases.Microarray gene expression data can be defined by a massive searching space that poses a primary cha...In bioinformatics applications,examination of microarray data has received significant interest to diagnose diseases.Microarray gene expression data can be defined by a massive searching space that poses a primary challenge in the appropriate selection of genes.Microarray data classification incorporates multiple disciplines such as bioinformatics,machine learning(ML),data science,and pattern classification.This paper designs an optimal deep neural network based microarray gene expression classification(ODNN-MGEC)model for bioinformatics applications.The proposed ODNN-MGEC technique performs data normalization process to normalize the data into a uniform scale.Besides,improved fruit fly optimization(IFFO)based feature selection technique is used to reduce the high dimensionality in the biomedical data.Moreover,deep neural network(DNN)model is applied for the classification of microarray gene expression data and the hyperparameter tuning of the DNN model is carried out using the Symbiotic Organisms Search(SOS)algorithm.The utilization of IFFO and SOS algorithms pave the way for accomplishing maximum gene expression classification outcomes.For examining the improved outcomes of the ODNN-MGEC technique,a wide ranging experimental analysis is made against benchmark datasets.The extensive comparison study with recent approaches demonstrates the enhanced outcomes of the ODNN-MGEC technique in terms of different measures.展开更多
Gene expression data represents a condition matrix where each rowrepresents the gene and the column shows the condition. Micro array used todetect gene expression in lab for thousands of gene at a time. Genes encode p...Gene expression data represents a condition matrix where each rowrepresents the gene and the column shows the condition. Micro array used todetect gene expression in lab for thousands of gene at a time. Genes encode proteins which in turn will dictate the cell function. The production of messengerRNA along with processing the same are the two main stages involved in the process of gene expression. The biological networks complexity added with thevolume of data containing imprecision and outliers increases the challenges indealing with them. Clustering methods are hence essential to identify the patternspresent in massive gene data. Many techniques involve hierarchical, partitioning,grid based, density based, model based and soft clustering approaches for dealingwith the gene expression data. Understanding the gene regulation and other usefulinformation from this data can be possible only through effective clustering algorithms. Though many methods are discussed in the literature, we concentrate onproviding a soft clustering approach for analyzing the gene expression data. Thepopulation elements are grouped based on the fuzziness principle and a degree ofmembership is assigned to all the elements. An improved Fuzzy clustering byLocal Approximation of Memberships (FLAME) is proposed in this workwhich overcomes the limitations of the other approaches while dealing with thenon-linear relationships and provide better segregation of biological functions.展开更多
Accurate gas viscosity determination is an important issue in the oil and gas industries.Experimental approaches for gas viscosity measurement are timeconsuming,expensive and hardly possible at high pressures and high...Accurate gas viscosity determination is an important issue in the oil and gas industries.Experimental approaches for gas viscosity measurement are timeconsuming,expensive and hardly possible at high pressures and high temperatures(HPHT).In this study,a number of correlations were developed to estimate gas viscosity by the use of group method of data handling(GMDH)type neural network and gene expression programming(GEP)techniques using a large data set containing more than 3000 experimental data points for methane,nitrogen,and hydrocarbon gas mixtures.It is worth mentioning that unlike many of viscosity correlations,the proposed ones in this study could compute gas viscosity at pressures ranging between 34 and 172 MPa and temperatures between 310 and 1300 K.Also,a comparison was performed between the results of these established models and the results of ten wellknown models reported in the literature.Average absolute relative errors of GMDH models were obtained 4.23%,0.64%,and 0.61%for hydrocarbon gas mixtures,methane,and nitrogen,respectively.In addition,graphical analyses indicate that the GMDH can predict gas viscosity with higher accuracy than GEP at HPHT conditions.Also,using leverage technique,valid,suspected and outlier data points were determined.Finally,trends of gas viscosity models at different conditions were evaluated.展开更多
The analysis of messenger Ribonucleic acid obtained through sequencing techniques (RNA-se- quencing) data is very challenging. Once technical difficulties have been sorted, an important choice has to be made during pr...The analysis of messenger Ribonucleic acid obtained through sequencing techniques (RNA-se- quencing) data is very challenging. Once technical difficulties have been sorted, an important choice has to be made during pre-processing: Two different paths can be chosen: Transform RNA- sequencing count data to a continuous variable or continue to work with count data. For each data type, analysis tools have been developed and seem appropriate at first sight, but a deeper analysis of data distribution and structure, are a discussion worth. In this review, open questions regarding RNA-sequencing data nature are discussed and highlighted, indicating important future research topics in statistics that should be addressed for a better analysis of already available and new appearing gene expression data. Moreover, a comparative analysis of RNAseq count and transformed data is presented. This comparison indicates that transforming RNA-seq count data seems appropriate, at least for differential expression detection.展开更多
Objective: To explore potential genes associated with the formation and rupture of intracranial aneurysms based on the Gene Expression Omnibus (GEO) database. Methods: A total of 133 mRNA microarrays were collected fr...Objective: To explore potential genes associated with the formation and rupture of intracranial aneurysms based on the Gene Expression Omnibus (GEO) database. Methods: A total of 133 mRNA microarrays were collected from the GEO database. Differential mRNA gene analysis was performed on the data of each group in the GEO2R platform, and the common differential genes were screened and the gene ontology enrichment analysis and the Kyoto Gene and Genomic Encyclopedia pathway enrichment analysis were completed. The screened differential genes were introduced into the String online database to obtain the interaction between the proteins encoded by the differential genes. Results: Forty-two common differential genes were screened, and the main biological processes involved included the transcriptional regulation of oxidative stress, the positive regulation of chemokine production, and the positive regulation of autophagy of giant cells by RNA polymerase II promoter. Molecular functions included protein binding, RNA polymerase II transcriptional co-repressor activity, transcriptional activator activity, and protein kinase C binding. The main signal pathways covered included hypoxia-inducible factor-1 signaling pathway, glucagon signaling pathway, and metabolic pathway signaling pathway. Conclusions: The formation and rupture of the intracranial aneurysm may be initially screened with amidoxime reduction component 1, tumor necrosis factor-α-inducible protein 6, haptoglobin, mast cell membrane-expressing protein 1, zipper containing kinase, phospholipase Cβ4 and blood and nervous system expression factor-1. In addition to the previously knownintracranial aneurysms mechanisms, cellular autophagy and hypoxia inducible factor-1 pathway may also be involved in the formation of intracranial aneurysms.展开更多
Acute leukemia is an aggressive disease that has high mortality rates worldwide.The error rate can be as high as 40%when classifying acute leukemia into its subtypes.So,there is an urgent need to support hematologists...Acute leukemia is an aggressive disease that has high mortality rates worldwide.The error rate can be as high as 40%when classifying acute leukemia into its subtypes.So,there is an urgent need to support hematologists during the classification process.More than two decades ago,researchers used microarray gene expression data to classify cancer and adopted acute leukemia as a test case.The high classification accuracy they achieved confirmed that it is possible to classify cancer subtypes using microarray gene expression data.Ensemble machine learning is an effective method that combines individual classifiers to classify new samples.Ensemble classifiers are recognized as powerful algorithms with numerous advantages over traditional classifiers.Over the past few decades,researchers have focused a great deal of attention on ensemble classifiers in a wide variety of fields,including but not limited to disease diagnosis,finance,bioinformatics,healthcare,manufacturing,and geography.This paper reviews the recent ensemble classifier approaches utilized for acute leukemia gene expression data classification.Moreover,a framework for classifying acute leukemia gene expression data is proposed.The pairwise correlation gene selection method and the Rotation Forest of Bayesian Networks are both used in this framework.Experimental outcomes show that the classification accuracy achieved by the acute leukemia ensemble classifiers constructed according to the suggested framework is good compared to the classification accuracy achieved in other studies.展开更多
Objective:To explore the expression and clinical significance of RACGAP1 gene in hepatocellular carcinoma.Methods:Data about RACGAP1 gene and clinic pathological data in liver cancer were retrieved from The Cancer Gen...Objective:To explore the expression and clinical significance of RACGAP1 gene in hepatocellular carcinoma.Methods:Data about RACGAP1 gene and clinic pathological data in liver cancer were retrieved from The Cancer Genome Atlas(TCGA).The relationship between the expression of RACGAP1 gene and clinic pathological parameters,and prognosis were analyzed by R 2.15.3 software.The association between RACGAP1 gene expression and prognosis of liver cancer patients was analyzed by Kaplan-Meier survival function analysis and Cox regression analysis.Results:TCGA database was used to collect 235 cases of liver cancer with clinical pathological parameters and their corresponding RACGAP1 expression levels.After the incomplete cases and those with no detailed pathological parameters were excluded,and it was found that RACGAP1 was highly expressed in liver cancer tissues.Meanwhile,the expression of RACGAP1 in patients with liver cancer in the TCGA tumor database was further analyzed with the matching clinical data parameters.The expression level of RACGAP1 was significantly correlated with the pathological grade and T stage of liver cancer patients(all P<0.05),but was not significantly correlated with American Joint Committee on Cancer(AJCC)pathological stage and gender(P>0.05).There was a significant correlation between RACGAP1 expression level and overall survival(OS)in patients with liver cancer(P<0.05),and the overall survival time of patients with low expression was better than that of patients with high expression(P<0.05).Cox regression was used to analyze the correlation between T stage,M stage,N stage and RACGAP1 expression in patients with hepatocellular carcinoma(HCC),and RACGAP1 became an independent prognostic factor in patients with HCC(P<0.05).Conclusion:Based on the tumor-related gene information in the public database TCGA,RACGAP1 gene is highly expressed in liver cancer tissues and becomes an independent prognostic factor of liver cancer,which is expected to become an important therapeutic target of drug therapy for liver cancer.展开更多
For making better use of nucleic acid resources of Gossypium hirsutum, a data-mining method was used to identify putative genes responsive to various abiotic stresses in G. hirsutum. Based on the compiled database inc...For making better use of nucleic acid resources of Gossypium hirsutum, a data-mining method was used to identify putative genes responsive to various abiotic stresses in G. hirsutum. Based on the compiled database including genes involved in abiotic stress response in Arabidopsis thaliana and the comprehensive analysis tool of GENEVESTIGATOR v3, 826 genes up-regulated or down-regulated significantly in roots or leaves during salt or cold treatment in Arabidopsis were identified. As compared to these 826 Arabidopsis genes annotated, 38 homologous expressed sequence tags (ESTs) from G. hirsutum were selected randomly and their expression patterns were studied using a quantitative real-time reverse transcription-polymerase chain reaction method. Among these 38 ESTs, about 55% of the genes (21 of 38) were different in response to ABA between cotton and Arabidopsis, whereas 70% of genes had similar responses to cold and salt treatments, and some of them which had not been characterized in Arabidopsis are now being investigated in gene function studies. According to these results, this approach of analyzing ESTs appears effective in large-scale identification of cotton genes involved in abiotic stress and might be adopted to determine gene functions in various biologic processes in cotton.展开更多
BACKGROUND The objectives of this study were to identify hub genes and biological pathways involved in lung adenocarcinoma(LUAD)via bioinformatics analysis,and investigate potential therapeutic targets.AIM To determin...BACKGROUND The objectives of this study were to identify hub genes and biological pathways involved in lung adenocarcinoma(LUAD)via bioinformatics analysis,and investigate potential therapeutic targets.AIM To determine reliable prognostic biomarkers for early diagnosis and treatment of LUAD.METHODS To identify potential therapeutic targets for LUAD,two microarray datasets derived from the Gene Expression Omnibus(GEO)database were analyzed,GSE3116959 and GSE118370.Differentially expressed genes(DEGs)in LUAD and normal tissues were identified using the GEO2R tool.The Hiplot database was then used to generate a volcanic map of the DEGs.Weighted gene co-expression network analysis was conducted to cluster the genes in GSE116959 and GSE-118370 into different modules,and identify immune genes shared between them.A protein-protein interaction network was established using the Search Tool for the Retrieval of Interacting Genes database,then the CytoNCA and CytoHubba components of Cytoscape software were used to visualize the genes.Hub genes with high scores and co-expression were identified,and the Database for Annotation,Visualization and Integrated Discovery was used to perform enrichment analysis of these genes.The diagnostic and prognostic values of the hub genes were calculated using receiver operating characteristic curves and Kaplan-Meier survival analysis,and gene-set enrichment analysis was conducted.The University of Alabama at Birmingham Cancer data analysis portal was used to analyze relationships between the hub genes and normal specimens,as well as their expression during tumor progression.Lastly,validation of protein expression was conducted on the identified hub genes via the Human Protein Atlas database.RESULTS Three hub genes with high connectivity were identified;cellular retinoic acid binding protein 2(CRABP2),matrix metallopeptidase 12(MMP12),and DNA topoisomerase II alpha(TOP2A).High expression of these genes was associated with a poor LUAD prognosis,and the genes exhibited high diagnostic value.CONCLUSION Expression levels of CRABP2,MMP12,and TOP2A in LUAD were higher than those in normal lung tissue.This observation has diagnostic value,and is linked to poor LUAD prognosis.These genes may be biomarkers and therapeutic targets in LUAD,but further research is warranted to investigate their usefulness in these respects.展开更多
BACKGROUND Burkitt lymphoma(BL)is an exceptionally aggressive malignant neoplasm that arises from either the germinal center or post-germinal center B cells.Patients with BL often present with rapid tumor growth and r...BACKGROUND Burkitt lymphoma(BL)is an exceptionally aggressive malignant neoplasm that arises from either the germinal center or post-germinal center B cells.Patients with BL often present with rapid tumor growth and require high-intensity multidrug therapy combined with adequate intrathecal chemotherapy prophylaxis,however,a standard treatment program for BL has not yet been established.It is important to identify biomarkers for predicting the prognosis of BLs and discriminating patients who might benefit from the therapy.Microarray data and sequencing information from public databases could offer opportunities for the discovery of new diagnostic or therapeutic targets.AIM To identify hub genes and perform gene ontology(GO)and survival analysis in BL.METHODS Gene expression profiles and clinical traits of BL patients were collected from the Gene Expression Omnibus database.Weighted gene co-expression network analysis(WGCNA)was applied to construct gene co-expression modules,and the cytoHubba tool was used to find the hub genes.Then,the hub genes were analyzed using GO and Kyoto Encyclopedia of Genes and Genomes analysis.Additionally,a Protein-Protein Interaction network and a Genetic Interaction network were constructed.Prognostic candidate genes were identified through overall survival analysis.Finally,a nomogram was established to assess the predictive value of hub genes,and drug-gene interactions were also constructed.RESULTS In this study,we obtained 8 modules through WGCNA analysis,and there was a significant correlation between the yellow module and age.Then we identified 10 hub genes(SRC,TLR4,CD40,STAT3,SELL,CXCL10,IL2RA,IL10RA,CCR7 and FCGR2B)by cytoHubba tool.Within these hubs,two genes were found to be associated with OS(CXCL10,P=0.029 and IL2RA,P=0.0066)by survival analysis.Additionally,we combined these two hub genes and age to build a nomogram.Moreover,the drugs related to IL2RA and CXCL10 might have a potential therapeutic role in relapsed and refractory BL.CONCLUSION From WGCNA and survival analysis,we identified CXCL10 and IL2RA that might be prognostic markers for BL.展开更多
文摘Lung cancer remains a significant global health challenge and identifying lung cancer at an early stage is essential for enhancing patient outcomes. The study focuses on developing and optimizing gene expression-based models for classifying cancer types using machine learning techniques. By applying Log2 normalization to gene expression data and conducting Wilcoxon rank sum tests, the researchers employed various classifiers and Incremental Feature Selection (IFS) strategies. The study culminated in two optimized models using the XGBoost classifier, comprising 10 and 74 genes respectively. The 10-gene model, due to its simplicity, is proposed for easier clinical implementation, whereas the 74-gene model exhibited superior performance in terms of Specificity, AUC (Area Under the Curve), and Precision. These models were evaluated based on their sensitivity, AUC, and specificity, aiming to achieve high sensitivity and AUC while maintaining reasonable specificity.
基金supported by the Sichuan province Science&Technology Department Crops Breeding Project(2021YFYZ0002)。
文摘The continued expansion of the world population,increasingly inconsistent climate and shrinking agricultural resources present major challenges to crop breeding.Fortunately,the increasing ability to discover and manipulate genes creates new opportunities to develop more productive and resilient cultivars.Many genes have been described in papers as being beneficial for yield increase.However,few of them have been translated into increased yield on farms.In contrast,commercial breeders are facing gene decidophobia,i.e.,puzzled about which gene to choose for breeding among the many identified,a huge chasm between gene discovery and cultivar innovation.The purpose of this paper is to draw attention to the shortfalls in current gene discovery research and to emphasise the need to align with cultivar innovation.The methodology dictates that genetic studies not only focus on gene discovery but also pay good attention to the genetic backgrounds,experimental validation in relevant environments,appropriate crop management,and data reusability.The close of the gaps should accelerate the application of molecular study in breeding and contribute to future global food security.
基金supported by the National Natural Science Foundation of China (No.32070656)the Nanjing University Deng Feng Scholars Program+1 种基金the Priority Academic Program Development (PAPD) of Jiangsu Higher Education Institutions,China Postdoctoral Science Foundation funded project (No.2022M711563)Jiangsu Funding Program for Excellent Postdoctoral Talent (No.2022ZB50)
文摘Plant morphogenesis relies on precise gene expression programs at the proper time and position which is orchestrated by transcription factors(TFs)in intricate regulatory networks in a cell-type specific manner.Here we introduced a comprehensive single-cell transcriptomic atlas of Arabidopsis seedlings.This atlas is the result of meticulous integration of 63 previously published scRNA-seq datasets,addressing batch effects and conserving biological variance.This integration spans a broad spectrum of tissues,including both below-and above-ground parts.Utilizing a rigorous approach for cell type annotation,we identified 47 distinct cell types or states,largely expanding our current view of plant cell compositions.We systematically constructed cell-type specific gene regulatory networks and uncovered key regulators that act in a coordinated manner to control cell-type specific gene expression.Taken together,our study not only offers extensive plant cell atlas exploration that serves as a valuable resource,but also provides molecular insights into gene-regulatory programs that varies from different cell types.
文摘Background:Systemic lupus erythematosus(SLE)is a complex chronic autoimmune disease with no known cure.However,the regulatory mechanism of immunity-related genes is not fully understood in SLE.In order to explore new therapeutic targets,we used bioinformatical methods to analyze a series of data.Methods:After downloading and processing the data from Gene Expression Omnibus database,the differentially expressed genes of SLE were analyzed.CIBERSORT algorithm was used to analyze the immune infiltration of SLE.Based on single-cell RNA-sequencing data,the role of immune-related genes in SLE and its target organ(kidney)were analyzed.Key transcription factors affecting immune-related genes were identified.Cell-cell communication networks in SLE were analyzed.Results:In total,15 hub genes and 4 transcription factors were found in the bulk data.Monocytes and macrophages in GSE81622(SLE)showed more infiltration.There were four cell types were annotated in scRNA sequencing dataset(GSE135779),as follows T cells,monocyte,NK cells and B cells.Immunity-related genes were overexpressed in monocytes.Conclusion:The present study shows that immune-related genes affect SLE through monocytes and play an important role in target organ renal injury.
基金supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(2018R1D1A1B07040282 to JJ)+1 种基金a grant from Kyung Hee University in 2018(KHU-20181065 to JJ)
文摘A Schwann cell has regenerative capabilities and is an important cell in the peripheral nervous system.This microarray study is part of a bioinformatics study that focuses mainly on Schwann cells. Microarray data provide information on differences between microarray-based and experiment-based gene expression analyses. According to microarray data, several genes exhibit increased expression(fold change) but they are weakly expressed in experimental studies(based on morphology, protein and mRNA levels). In contrast, some genes are weakly expressed in microarray data and highly expressed in experimental studies;such genes may represent future target genes in Schwann cell studies. These studies allow us to learn about additional genes that could be used to achieve targeted results from experimental studies. In the current big data study by retrieving more than 5000 scientific articles from PubMed or NCBI, Google Scholar, and Google, 1016(up-and downregulated) genes were determined to be related to Schwann cells. However,no experiment was performed in the laboratory; rather, the present study is part of a big data analysis. Our study will contribute to our understanding of Schwann cell biology by aiding in the identification of genes.Based on a comparative analysis of all microarray data, we conclude that the microarray could be a good tool for predicting the expression and intensity of different genes of interest in actual experiments.
基金supported by grants from the National Natural Science Foundation of China(No.81272340,No.81472386,No.81672872)the National High Technology Research and Development Program of China(863 Program)(No.2012AA02A501)+1 种基金the Science and Technology Planning Project of Guangdong Province,China(No.2014B020212017,No.2014B050504004 and No.2015B050501005)the Natural Science Foundation of Guangdong Province,China(No.2016A030311011)
文摘Metastasis is the greatest contributor to cancer?related death.In the era of precision medicine,it is essential to predict and to prevent the spread of cancer cells to significantly improve patient survival.Thanks to the application of a variety of high?throughput technologies,accumulating big data enables researchers and clinicians to identify aggressive tumors as well as patients with a high risk of cancer metastasis.However,there have been few large?scale gene collection studies to enable metastasis?related analyses.In the last several years,emerging efforts have identi?fied pro?metastatic genes in a variety of cancers,providing us the ability to generate a pro?metastatic gene cluster for big data analyses.We carefully selected 285 genes with in vivo evidence of promoting metastasis reported in the literature.These genes have been investigated in different tumor types.We used two datasets downloaded from The Cancer Genome Atlas database,specifically,datasets of clear cell renal cell carcinoma and hepatocellular carcinoma,for validation tests,and excluded any genes for which elevated expression level correlated with longer overall survival in any of the datasets.Ultimately,150 pro?metastatic genes remained in our analyses.We believe this collection of pro?metastatic genes will be helpful for big data analyses,and eventually will accelerate anti?metastasis research and clinical intervention.
基金This work was supported by the National Natural Science Foundation of China(31760402)Public Welfare Research Projects in the Autonomous Region(KY2019002)Special Programs for New Varieties Cultivation of Shihezi University(YZZX201701).
文摘Background:Meta-analysis of quantitative trait locus(QTL)is a computational technique to identify consensus QTL and refine QTL positions on the consensus map from multiple mapping studies.The combination of meta-QTL intervals,significant SNPs and transcriptome analysis has been widely used to identify candidate genes in various plants.Results:In our study,884 QTLs associated with cotton fiber quality traits from 12 studies were used for meta-QTL analysis based on reference genome TM-1,as a result,74 meta-QTLs were identified,including 19 meta-QTLs for fiber length;18 meta-QTLs for fiber strength;11 meta-QTLs for fiber uniformity;11 meta-QTLs for fiber elongation;and 15 meta-QTLs for micronaire.Combined with 8589 significant single nucleotide polymorphisms associated with fiber quality traits collected from 15 studies,297 candidate genes were identified in the meta-QTL intervals,20 of which showed high expression levels specifically in the developing fibers.According to the function annotations,some of the 20 key candidate genes are associated with the fiber development.Conclusions:This study provides not only stable QTLs used for marker-assisted selection,but also candidate genes to uncover the molecular mechanisms for cotton fiber development.
文摘In this paper, a similarity measure between genes with protein-protein interactions is pro-posed. The chip-chip data are converted into the same form of gene expression data with pear-son correlation as its similarity measure. On the basis of the similarity measures of protein- protein interaction data and chip-chip data, the combined dissimilarity measure is defined. The combined distance measure is introduced into K-means method, which can be considered as an improved K-means method. The improved K-means method and other three clustering methods are evaluated by a real dataset. Per-formance of these methods is assessed by a prediction accuracy analysis through known gene annotations. Our results show that the improved K-means method outperforms other clustering methods. The performance of the improved K-means method is also tested by varying the tuning coefficients of the combined dissimilarity measure. The results show that it is very helpful and meaningful to incorporate het-erogeneous data sources in clustering gene expression data, and those coefficients for the genome-wide or completed data sources should be given larger values when constructing the combined dissimilarity measure.
基金Supported by Applied Basic Research Project of Yunnan Academy of Agricultural Sciences(YJM201801)Applied Basic Research Youth Project of Yunnan Province(2017FD015)Technical Innovation Talent Training Program of Yunnan Province(2015HB107)
文摘Gibberellins are an important class of plant hormones.They play an important regulatory role in all stages of growth and development of higher plants.The use of mutants to study gibberellin metabolism and signal transduction pathways is currently a research hotspot.This article takes the data of Affymetrix chips of rice as an example,bioinformatics method was used to study rice SLR1 mutant and mine differentially expressed wild-type genes,thus exploring the expression regulation network of gibberellin signaling pathway-related genes.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under grant number(RGP 2/42/43)This work was supported by Taif University Researchers Supporting Program(project number:TURSP-2020/200),Taif University,Saudi Arabia.
文摘In bioinformatics applications,examination of microarray data has received significant interest to diagnose diseases.Microarray gene expression data can be defined by a massive searching space that poses a primary challenge in the appropriate selection of genes.Microarray data classification incorporates multiple disciplines such as bioinformatics,machine learning(ML),data science,and pattern classification.This paper designs an optimal deep neural network based microarray gene expression classification(ODNN-MGEC)model for bioinformatics applications.The proposed ODNN-MGEC technique performs data normalization process to normalize the data into a uniform scale.Besides,improved fruit fly optimization(IFFO)based feature selection technique is used to reduce the high dimensionality in the biomedical data.Moreover,deep neural network(DNN)model is applied for the classification of microarray gene expression data and the hyperparameter tuning of the DNN model is carried out using the Symbiotic Organisms Search(SOS)algorithm.The utilization of IFFO and SOS algorithms pave the way for accomplishing maximum gene expression classification outcomes.For examining the improved outcomes of the ODNN-MGEC technique,a wide ranging experimental analysis is made against benchmark datasets.The extensive comparison study with recent approaches demonstrates the enhanced outcomes of the ODNN-MGEC technique in terms of different measures.
文摘Gene expression data represents a condition matrix where each rowrepresents the gene and the column shows the condition. Micro array used todetect gene expression in lab for thousands of gene at a time. Genes encode proteins which in turn will dictate the cell function. The production of messengerRNA along with processing the same are the two main stages involved in the process of gene expression. The biological networks complexity added with thevolume of data containing imprecision and outliers increases the challenges indealing with them. Clustering methods are hence essential to identify the patternspresent in massive gene data. Many techniques involve hierarchical, partitioning,grid based, density based, model based and soft clustering approaches for dealingwith the gene expression data. Understanding the gene regulation and other usefulinformation from this data can be possible only through effective clustering algorithms. Though many methods are discussed in the literature, we concentrate onproviding a soft clustering approach for analyzing the gene expression data. Thepopulation elements are grouped based on the fuzziness principle and a degree ofmembership is assigned to all the elements. An improved Fuzzy clustering byLocal Approximation of Memberships (FLAME) is proposed in this workwhich overcomes the limitations of the other approaches while dealing with thenon-linear relationships and provide better segregation of biological functions.
文摘Accurate gas viscosity determination is an important issue in the oil and gas industries.Experimental approaches for gas viscosity measurement are timeconsuming,expensive and hardly possible at high pressures and high temperatures(HPHT).In this study,a number of correlations were developed to estimate gas viscosity by the use of group method of data handling(GMDH)type neural network and gene expression programming(GEP)techniques using a large data set containing more than 3000 experimental data points for methane,nitrogen,and hydrocarbon gas mixtures.It is worth mentioning that unlike many of viscosity correlations,the proposed ones in this study could compute gas viscosity at pressures ranging between 34 and 172 MPa and temperatures between 310 and 1300 K.Also,a comparison was performed between the results of these established models and the results of ten wellknown models reported in the literature.Average absolute relative errors of GMDH models were obtained 4.23%,0.64%,and 0.61%for hydrocarbon gas mixtures,methane,and nitrogen,respectively.In addition,graphical analyses indicate that the GMDH can predict gas viscosity with higher accuracy than GEP at HPHT conditions.Also,using leverage technique,valid,suspected and outlier data points were determined.Finally,trends of gas viscosity models at different conditions were evaluated.
文摘The analysis of messenger Ribonucleic acid obtained through sequencing techniques (RNA-se- quencing) data is very challenging. Once technical difficulties have been sorted, an important choice has to be made during pre-processing: Two different paths can be chosen: Transform RNA- sequencing count data to a continuous variable or continue to work with count data. For each data type, analysis tools have been developed and seem appropriate at first sight, but a deeper analysis of data distribution and structure, are a discussion worth. In this review, open questions regarding RNA-sequencing data nature are discussed and highlighted, indicating important future research topics in statistics that should be addressed for a better analysis of already available and new appearing gene expression data. Moreover, a comparative analysis of RNAseq count and transformed data is presented. This comparison indicates that transforming RNA-seq count data seems appropriate, at least for differential expression detection.
基金supported by the National Natural Science Foundation of China(Grant No.31200809)Project of Shanghai Hongkou District Health and Family Planning Commission(Grant No.1802-06)"Special Fund Project for Basic Scientific Research Operating Expenses of Central Universities"of Tongji University(Grant No.22120180282).
文摘Objective: To explore potential genes associated with the formation and rupture of intracranial aneurysms based on the Gene Expression Omnibus (GEO) database. Methods: A total of 133 mRNA microarrays were collected from the GEO database. Differential mRNA gene analysis was performed on the data of each group in the GEO2R platform, and the common differential genes were screened and the gene ontology enrichment analysis and the Kyoto Gene and Genomic Encyclopedia pathway enrichment analysis were completed. The screened differential genes were introduced into the String online database to obtain the interaction between the proteins encoded by the differential genes. Results: Forty-two common differential genes were screened, and the main biological processes involved included the transcriptional regulation of oxidative stress, the positive regulation of chemokine production, and the positive regulation of autophagy of giant cells by RNA polymerase II promoter. Molecular functions included protein binding, RNA polymerase II transcriptional co-repressor activity, transcriptional activator activity, and protein kinase C binding. The main signal pathways covered included hypoxia-inducible factor-1 signaling pathway, glucagon signaling pathway, and metabolic pathway signaling pathway. Conclusions: The formation and rupture of the intracranial aneurysm may be initially screened with amidoxime reduction component 1, tumor necrosis factor-α-inducible protein 6, haptoglobin, mast cell membrane-expressing protein 1, zipper containing kinase, phospholipase Cβ4 and blood and nervous system expression factor-1. In addition to the previously knownintracranial aneurysms mechanisms, cellular autophagy and hypoxia inducible factor-1 pathway may also be involved in the formation of intracranial aneurysms.
文摘Acute leukemia is an aggressive disease that has high mortality rates worldwide.The error rate can be as high as 40%when classifying acute leukemia into its subtypes.So,there is an urgent need to support hematologists during the classification process.More than two decades ago,researchers used microarray gene expression data to classify cancer and adopted acute leukemia as a test case.The high classification accuracy they achieved confirmed that it is possible to classify cancer subtypes using microarray gene expression data.Ensemble machine learning is an effective method that combines individual classifiers to classify new samples.Ensemble classifiers are recognized as powerful algorithms with numerous advantages over traditional classifiers.Over the past few decades,researchers have focused a great deal of attention on ensemble classifiers in a wide variety of fields,including but not limited to disease diagnosis,finance,bioinformatics,healthcare,manufacturing,and geography.This paper reviews the recent ensemble classifier approaches utilized for acute leukemia gene expression data classification.Moreover,a framework for classifying acute leukemia gene expression data is proposed.The pairwise correlation gene selection method and the Rotation Forest of Bayesian Networks are both used in this framework.Experimental outcomes show that the classification accuracy achieved by the acute leukemia ensemble classifiers constructed according to the suggested framework is good compared to the classification accuracy achieved in other studies.
文摘Objective:To explore the expression and clinical significance of RACGAP1 gene in hepatocellular carcinoma.Methods:Data about RACGAP1 gene and clinic pathological data in liver cancer were retrieved from The Cancer Genome Atlas(TCGA).The relationship between the expression of RACGAP1 gene and clinic pathological parameters,and prognosis were analyzed by R 2.15.3 software.The association between RACGAP1 gene expression and prognosis of liver cancer patients was analyzed by Kaplan-Meier survival function analysis and Cox regression analysis.Results:TCGA database was used to collect 235 cases of liver cancer with clinical pathological parameters and their corresponding RACGAP1 expression levels.After the incomplete cases and those with no detailed pathological parameters were excluded,and it was found that RACGAP1 was highly expressed in liver cancer tissues.Meanwhile,the expression of RACGAP1 in patients with liver cancer in the TCGA tumor database was further analyzed with the matching clinical data parameters.The expression level of RACGAP1 was significantly correlated with the pathological grade and T stage of liver cancer patients(all P<0.05),but was not significantly correlated with American Joint Committee on Cancer(AJCC)pathological stage and gender(P>0.05).There was a significant correlation between RACGAP1 expression level and overall survival(OS)in patients with liver cancer(P<0.05),and the overall survival time of patients with low expression was better than that of patients with high expression(P<0.05).Cox regression was used to analyze the correlation between T stage,M stage,N stage and RACGAP1 expression in patients with hepatocellular carcinoma(HCC),and RACGAP1 became an independent prognostic factor in patients with HCC(P<0.05).Conclusion:Based on the tumor-related gene information in the public database TCGA,RACGAP1 gene is highly expressed in liver cancer tissues and becomes an independent prognostic factor of liver cancer,which is expected to become an important therapeutic target of drug therapy for liver cancer.
基金Supports from Special Fund for Agro-Scientific Research in the Public Interest in China (3-19) the National Transgenic Plants Project of China(2008ZX08005-004) are kindly appreciated
文摘For making better use of nucleic acid resources of Gossypium hirsutum, a data-mining method was used to identify putative genes responsive to various abiotic stresses in G. hirsutum. Based on the compiled database including genes involved in abiotic stress response in Arabidopsis thaliana and the comprehensive analysis tool of GENEVESTIGATOR v3, 826 genes up-regulated or down-regulated significantly in roots or leaves during salt or cold treatment in Arabidopsis were identified. As compared to these 826 Arabidopsis genes annotated, 38 homologous expressed sequence tags (ESTs) from G. hirsutum were selected randomly and their expression patterns were studied using a quantitative real-time reverse transcription-polymerase chain reaction method. Among these 38 ESTs, about 55% of the genes (21 of 38) were different in response to ABA between cotton and Arabidopsis, whereas 70% of genes had similar responses to cold and salt treatments, and some of them which had not been characterized in Arabidopsis are now being investigated in gene function studies. According to these results, this approach of analyzing ESTs appears effective in large-scale identification of cotton genes involved in abiotic stress and might be adopted to determine gene functions in various biologic processes in cotton.
文摘BACKGROUND The objectives of this study were to identify hub genes and biological pathways involved in lung adenocarcinoma(LUAD)via bioinformatics analysis,and investigate potential therapeutic targets.AIM To determine reliable prognostic biomarkers for early diagnosis and treatment of LUAD.METHODS To identify potential therapeutic targets for LUAD,two microarray datasets derived from the Gene Expression Omnibus(GEO)database were analyzed,GSE3116959 and GSE118370.Differentially expressed genes(DEGs)in LUAD and normal tissues were identified using the GEO2R tool.The Hiplot database was then used to generate a volcanic map of the DEGs.Weighted gene co-expression network analysis was conducted to cluster the genes in GSE116959 and GSE-118370 into different modules,and identify immune genes shared between them.A protein-protein interaction network was established using the Search Tool for the Retrieval of Interacting Genes database,then the CytoNCA and CytoHubba components of Cytoscape software were used to visualize the genes.Hub genes with high scores and co-expression were identified,and the Database for Annotation,Visualization and Integrated Discovery was used to perform enrichment analysis of these genes.The diagnostic and prognostic values of the hub genes were calculated using receiver operating characteristic curves and Kaplan-Meier survival analysis,and gene-set enrichment analysis was conducted.The University of Alabama at Birmingham Cancer data analysis portal was used to analyze relationships between the hub genes and normal specimens,as well as their expression during tumor progression.Lastly,validation of protein expression was conducted on the identified hub genes via the Human Protein Atlas database.RESULTS Three hub genes with high connectivity were identified;cellular retinoic acid binding protein 2(CRABP2),matrix metallopeptidase 12(MMP12),and DNA topoisomerase II alpha(TOP2A).High expression of these genes was associated with a poor LUAD prognosis,and the genes exhibited high diagnostic value.CONCLUSION Expression levels of CRABP2,MMP12,and TOP2A in LUAD were higher than those in normal lung tissue.This observation has diagnostic value,and is linked to poor LUAD prognosis.These genes may be biomarkers and therapeutic targets in LUAD,but further research is warranted to investigate their usefulness in these respects.
文摘BACKGROUND Burkitt lymphoma(BL)is an exceptionally aggressive malignant neoplasm that arises from either the germinal center or post-germinal center B cells.Patients with BL often present with rapid tumor growth and require high-intensity multidrug therapy combined with adequate intrathecal chemotherapy prophylaxis,however,a standard treatment program for BL has not yet been established.It is important to identify biomarkers for predicting the prognosis of BLs and discriminating patients who might benefit from the therapy.Microarray data and sequencing information from public databases could offer opportunities for the discovery of new diagnostic or therapeutic targets.AIM To identify hub genes and perform gene ontology(GO)and survival analysis in BL.METHODS Gene expression profiles and clinical traits of BL patients were collected from the Gene Expression Omnibus database.Weighted gene co-expression network analysis(WGCNA)was applied to construct gene co-expression modules,and the cytoHubba tool was used to find the hub genes.Then,the hub genes were analyzed using GO and Kyoto Encyclopedia of Genes and Genomes analysis.Additionally,a Protein-Protein Interaction network and a Genetic Interaction network were constructed.Prognostic candidate genes were identified through overall survival analysis.Finally,a nomogram was established to assess the predictive value of hub genes,and drug-gene interactions were also constructed.RESULTS In this study,we obtained 8 modules through WGCNA analysis,and there was a significant correlation between the yellow module and age.Then we identified 10 hub genes(SRC,TLR4,CD40,STAT3,SELL,CXCL10,IL2RA,IL10RA,CCR7 and FCGR2B)by cytoHubba tool.Within these hubs,two genes were found to be associated with OS(CXCL10,P=0.029 and IL2RA,P=0.0066)by survival analysis.Additionally,we combined these two hub genes and age to build a nomogram.Moreover,the drugs related to IL2RA and CXCL10 might have a potential therapeutic role in relapsed and refractory BL.CONCLUSION From WGCNA and survival analysis,we identified CXCL10 and IL2RA that might be prognostic markers for BL.