Identification of differentially expressed genes (DEGs) in time course studies is very useful for understanding gene function, and can help determine key genes during specific stages of plant development. A few exis...Identification of differentially expressed genes (DEGs) in time course studies is very useful for understanding gene function, and can help determine key genes during specific stages of plant development. A few existing methods focus on the detection of DEGs within a single biological group, enabling to study temporal changes in gene expression. To utilize a rapidly increasing amount of single-group time-series expression data, we propose a two-step method that integrates the temporal characteristics of time-series data to obtain a B-spline curve fit. Firstly, a fiat gene filter based on the Ljung-Box test is used to filter out flat genes. Then, a B-spline model is used to identify DEGs. For use in biological experiments, these DEGs should be screened, to determine their biological importance. To identify high-confidence promising DEGs for specific biological processes, we propose a novel gene prioritization approach based on the partner evaluation principle. This novel gene prioritization ap- proach utilizes existing co-expression information to rank DEGs that are likely to be involved in a specific biological process/condition. The proposed method is validated on the Arabidopsis thaliana seed germination dataset and on the rice anther development expression dataset.展开更多
Lung cancer remains a significant global health challenge and identifying lung cancer at an early stage is essential for enhancing patient outcomes. The study focuses on developing and optimizing gene expression-based...Lung cancer remains a significant global health challenge and identifying lung cancer at an early stage is essential for enhancing patient outcomes. The study focuses on developing and optimizing gene expression-based models for classifying cancer types using machine learning techniques. By applying Log2 normalization to gene expression data and conducting Wilcoxon rank sum tests, the researchers employed various classifiers and Incremental Feature Selection (IFS) strategies. The study culminated in two optimized models using the XGBoost classifier, comprising 10 and 74 genes respectively. The 10-gene model, due to its simplicity, is proposed for easier clinical implementation, whereas the 74-gene model exhibited superior performance in terms of Specificity, AUC (Area Under the Curve), and Precision. These models were evaluated based on their sensitivity, AUC, and specificity, aiming to achieve high sensitivity and AUC while maintaining reasonable specificity.展开更多
AIM: To extend the knowledge of the dynamic interaction between Helicobacter pylori (H. pylori) and host mucosa. METHODS: A time-series cDNA microarray was performed in order to detect the temporal gene expression pro...AIM: To extend the knowledge of the dynamic interaction between Helicobacter pylori (H. pylori) and host mucosa. METHODS: A time-series cDNA microarray was performed in order to detect the temporal gene expression prof iles of human gastric epithelial adenocarcinoma cells infected with H. pylori. Six time points were selected to observe the changes in the model. A differential expression prof ile at each time point was obtained by comparing the microarray signal value with that of 0 h. Real-time polymerase chain reaction was subsequently performed to evaluate the data quality. RESULTS: We found a diversity of gene expression patterns at different time points and identifi ed a group of genes whose expression levels were significantly correlated with several important immune response and tumor related pathways. CONCLUSION: Early infection may trigger some important pathways and may impact the outcome of the infection.展开更多
Accurate gas viscosity determination is an important issue in the oil and gas industries.Experimental approaches for gas viscosity measurement are timeconsuming,expensive and hardly possible at high pressures and high...Accurate gas viscosity determination is an important issue in the oil and gas industries.Experimental approaches for gas viscosity measurement are timeconsuming,expensive and hardly possible at high pressures and high temperatures(HPHT).In this study,a number of correlations were developed to estimate gas viscosity by the use of group method of data handling(GMDH)type neural network and gene expression programming(GEP)techniques using a large data set containing more than 3000 experimental data points for methane,nitrogen,and hydrocarbon gas mixtures.It is worth mentioning that unlike many of viscosity correlations,the proposed ones in this study could compute gas viscosity at pressures ranging between 34 and 172 MPa and temperatures between 310 and 1300 K.Also,a comparison was performed between the results of these established models and the results of ten wellknown models reported in the literature.Average absolute relative errors of GMDH models were obtained 4.23%,0.64%,and 0.61%for hydrocarbon gas mixtures,methane,and nitrogen,respectively.In addition,graphical analyses indicate that the GMDH can predict gas viscosity with higher accuracy than GEP at HPHT conditions.Also,using leverage technique,valid,suspected and outlier data points were determined.Finally,trends of gas viscosity models at different conditions were evaluated.展开更多
In this paper, a similarity measure between genes with protein-protein interactions is pro-posed. The chip-chip data are converted into the same form of gene expression data with pear-son correlation as its similarity...In this paper, a similarity measure between genes with protein-protein interactions is pro-posed. The chip-chip data are converted into the same form of gene expression data with pear-son correlation as its similarity measure. On the basis of the similarity measures of protein- protein interaction data and chip-chip data, the combined dissimilarity measure is defined. The combined distance measure is introduced into K-means method, which can be considered as an improved K-means method. The improved K-means method and other three clustering methods are evaluated by a real dataset. Per-formance of these methods is assessed by a prediction accuracy analysis through known gene annotations. Our results show that the improved K-means method outperforms other clustering methods. The performance of the improved K-means method is also tested by varying the tuning coefficients of the combined dissimilarity measure. The results show that it is very helpful and meaningful to incorporate het-erogeneous data sources in clustering gene expression data, and those coefficients for the genome-wide or completed data sources should be given larger values when constructing the combined dissimilarity measure.展开更多
In bioinformatics applications,examination of microarray data has received significant interest to diagnose diseases.Microarray gene expression data can be defined by a massive searching space that poses a primary cha...In bioinformatics applications,examination of microarray data has received significant interest to diagnose diseases.Microarray gene expression data can be defined by a massive searching space that poses a primary challenge in the appropriate selection of genes.Microarray data classification incorporates multiple disciplines such as bioinformatics,machine learning(ML),data science,and pattern classification.This paper designs an optimal deep neural network based microarray gene expression classification(ODNN-MGEC)model for bioinformatics applications.The proposed ODNN-MGEC technique performs data normalization process to normalize the data into a uniform scale.Besides,improved fruit fly optimization(IFFO)based feature selection technique is used to reduce the high dimensionality in the biomedical data.Moreover,deep neural network(DNN)model is applied for the classification of microarray gene expression data and the hyperparameter tuning of the DNN model is carried out using the Symbiotic Organisms Search(SOS)algorithm.The utilization of IFFO and SOS algorithms pave the way for accomplishing maximum gene expression classification outcomes.For examining the improved outcomes of the ODNN-MGEC technique,a wide ranging experimental analysis is made against benchmark datasets.The extensive comparison study with recent approaches demonstrates the enhanced outcomes of the ODNN-MGEC technique in terms of different measures.展开更多
Gene expression data represents a condition matrix where each rowrepresents the gene and the column shows the condition. Micro array used todetect gene expression in lab for thousands of gene at a time. Genes encode p...Gene expression data represents a condition matrix where each rowrepresents the gene and the column shows the condition. Micro array used todetect gene expression in lab for thousands of gene at a time. Genes encode proteins which in turn will dictate the cell function. The production of messengerRNA along with processing the same are the two main stages involved in the process of gene expression. The biological networks complexity added with thevolume of data containing imprecision and outliers increases the challenges indealing with them. Clustering methods are hence essential to identify the patternspresent in massive gene data. Many techniques involve hierarchical, partitioning,grid based, density based, model based and soft clustering approaches for dealingwith the gene expression data. Understanding the gene regulation and other usefulinformation from this data can be possible only through effective clustering algorithms. Though many methods are discussed in the literature, we concentrate onproviding a soft clustering approach for analyzing the gene expression data. Thepopulation elements are grouped based on the fuzziness principle and a degree ofmembership is assigned to all the elements. An improved Fuzzy clustering byLocal Approximation of Memberships (FLAME) is proposed in this workwhich overcomes the limitations of the other approaches while dealing with thenon-linear relationships and provide better segregation of biological functions.展开更多
The analysis of messenger Ribonucleic acid obtained through sequencing techniques (RNA-se- quencing) data is very challenging. Once technical difficulties have been sorted, an important choice has to be made during pr...The analysis of messenger Ribonucleic acid obtained through sequencing techniques (RNA-se- quencing) data is very challenging. Once technical difficulties have been sorted, an important choice has to be made during pre-processing: Two different paths can be chosen: Transform RNA- sequencing count data to a continuous variable or continue to work with count data. For each data type, analysis tools have been developed and seem appropriate at first sight, but a deeper analysis of data distribution and structure, are a discussion worth. In this review, open questions regarding RNA-sequencing data nature are discussed and highlighted, indicating important future research topics in statistics that should be addressed for a better analysis of already available and new appearing gene expression data. Moreover, a comparative analysis of RNAseq count and transformed data is presented. This comparison indicates that transforming RNA-seq count data seems appropriate, at least for differential expression detection.展开更多
AIM: To study and clone a novel liver cancer related gene, and to explore the molecular basis of liver cancer genesis. METHODS: Using mRNA differential display polymerase chain reaction (DDPCR), we investigated the di...AIM: To study and clone a novel liver cancer related gene, and to explore the molecular basis of liver cancer genesis. METHODS: Using mRNA differential display polymerase chain reaction (DDPCR), we investigated the difference of mRNA in human hepatocellular carcinoma (HCC) and paired surrounding liver tissues, and got a gene probe. By screening a human placenta cDNA library and genomic homologous extend, we obtained a full-length cDNA named HCCA3. We analyzed the expression of this novel gene in 42 pairs of HCC and the surrounding liver tissues, and distribution in human normal tissues by means of Northern blot assay. RESULTS: A full-length cDNA of liver cancer associated gene HCCA3 has been submitted to the GeneBank nucleotide sequence databases (Accession No. AF276707). The positive expression rate of this gene was 78.6% (33/42) in HCC tissues, and the clinical pathological data showed that the HCCA3 was closely associated with the invasion of tumor capsule (P=0.023) and adjacant small metastasis satellite nodules lesions (P=0.041). The HCCA3 was widely distributed in the human normal tissues, which was intensively expressed in lungs, brain and colon tissues, while lowly expressed in the liver tissues. CONCLUSION: A novel full-length cDNA was cloned and differentiated, which was highly expressed in liver cancer tissues. The high expression was closely related to the tumor invasiveness and metastasis,that may be the late heredited change in HCC genesis.展开更多
AIM: To investigate SBA2 expression in CRC cell lines and surgical specimens of CRC and autologous healthy mucosa. METHODS: Reverse transcription-polymerase chain reaction (RT-PCR) was used for relative quantification...AIM: To investigate SBA2 expression in CRC cell lines and surgical specimens of CRC and autologous healthy mucosa. METHODS: Reverse transcription-polymerase chain reaction (RT-PCR) was used for relative quantification of SBA2 mRNA levels in 4 human CRC cell lines with different grades of differentiation and 30 clinical samples. Normalization of the results was achieved by simultaneous amplification of beta-actin as an internal control. RESULTS: In the exponential range of amplification, fairly good linearity demonstrated identical amplification efficiency for SBA2 and beta-actin (82%). Markedly lower levels of SBA2 mRNA were detectable in tumors, as compared with the coupled normal counterparts P【0.01). SBA2 expression was significantly (0.01】P 【 0.05) correlated with the grade of differentiation in CRC, with relatively higher levels in well-differentiated samples and lower in poorly-differentiated cases. Of the 9 cases with lymph nodes affected, 78% (7/9) had reduced SBA2 mRNA expression in contrast to 24% (5/21) in non-metastasis samples 0.01】P【0.05). CONCLUSION: SBA2 gene might be a promising novel biomarker of cell differentiation in colorectal cancer and its biological features need further studies.展开更多
Bicoid is one of the important Drosophila maternal genes involved in the control of embryo polarity and larvae segmentation. To clone and characterize the rice bicoid-related genes, one cDNA clone, Rb24 (EMBL accessio...Bicoid is one of the important Drosophila maternal genes involved in the control of embryo polarity and larvae segmentation. To clone and characterize the rice bicoid-related genes, one cDNA clone, Rb24 (EMBL accession number: AJ2771380), was isolated by screening of rice unmature seed cDNA library. Sequence analysis indicates that Rb24 contains a putative amino acid sequence, which is homologous to unique 8 amino acids sequence within Drosophila bicoid homeodomain (50% identity, 75% similarity) and involves a lys-9 in putative helix 3. Northern blot analysis of rice RNA has shown that this sequence is expressed in a tissue-specific manner. The transcript was detected strongly in young panicles, but less in young leaves and roots. This results are further confirmed with paraffin section in situ hybridization. The signal is intensive in rice globular embryo and located at the apical tip of the embryo, then, along with the development of embryo, the signal is getting reduced and transfers into both sides of embryo. The existence of bicoid-related sequence in rice embryo and the similarity of polar distribution of bicoid and Rb24 mRNA in early embryo development may implicates a conserved maternal regulation mechanism of body axis presents in Drosophila and in rice.展开更多
AIM: To identify and understand the relationship between co-expression pattern and clinic traits in uveal melanoma, weighted gene co-expression network analysis(WGCNA) is applied to investigate the gene expression lev...AIM: To identify and understand the relationship between co-expression pattern and clinic traits in uveal melanoma, weighted gene co-expression network analysis(WGCNA) is applied to investigate the gene expression levels and patient clinic features. Uveal melanoma is the most common primary eye tumor in adults. Although many studies have identified some important genes and pathways that were relevant to progress of uveal melanoma, the relationship between co-expression and clinic traits in systems level of uveal melanoma is unclear yet. We employ WGCNA to investigate the relationship underlying molecular and phenotype in this study.METHODS: Gene expression profile of uveal melanoma and patient clinic traits were collected from the Gene Expression Omnibus(GEO) database. The gene co-expression is calculated by WGCNA that is the R package software. The package is used to analyze the correlation between pairs of expression levels of genes.The function of the genes were annotated by gene ontology(GO).RESULTS: In this study, we identified four co-expression modules significantly correlated with clinictraits. Module blue positively correlated with radiotherapy treatment. Module purple positively correlates with tumor location(sclera) and negatively correlates with patient age. Module red positively correlates with sclera and negatively correlates with thickness of tumor. Module black positively correlates with the largest tumor diameter(LTD). Additionally, we identified the hug gene(top connectivity with other genes) in each module. The hub gene RPS15 A, PTGDS, CD53 and MSI2 might play a vital role in progress of uveal melanoma.CONCLUSION: From WGCNA analysis and hub gene calculation, we identified RPS15 A, PTGDS, CD53 and MSI2 might be target or diagnosis for uveal melanoma.展开更多
Acute leukemia is an aggressive disease that has high mortality rates worldwide.The error rate can be as high as 40%when classifying acute leukemia into its subtypes.So,there is an urgent need to support hematologists...Acute leukemia is an aggressive disease that has high mortality rates worldwide.The error rate can be as high as 40%when classifying acute leukemia into its subtypes.So,there is an urgent need to support hematologists during the classification process.More than two decades ago,researchers used microarray gene expression data to classify cancer and adopted acute leukemia as a test case.The high classification accuracy they achieved confirmed that it is possible to classify cancer subtypes using microarray gene expression data.Ensemble machine learning is an effective method that combines individual classifiers to classify new samples.Ensemble classifiers are recognized as powerful algorithms with numerous advantages over traditional classifiers.Over the past few decades,researchers have focused a great deal of attention on ensemble classifiers in a wide variety of fields,including but not limited to disease diagnosis,finance,bioinformatics,healthcare,manufacturing,and geography.This paper reviews the recent ensemble classifier approaches utilized for acute leukemia gene expression data classification.Moreover,a framework for classifying acute leukemia gene expression data is proposed.The pairwise correlation gene selection method and the Rotation Forest of Bayesian Networks are both used in this framework.Experimental outcomes show that the classification accuracy achieved by the acute leukemia ensemble classifiers constructed according to the suggested framework is good compared to the classification accuracy achieved in other studies.展开更多
AIM: To clone the cDNA of UGT1A9 from a Chinese human liver and establish the Chinese hamster lung (CHL) cell line expressing human UGT1A9. METHODS: cDNA of UGT1 A9 was transcripted from mRNA by reverse transcriptase-...AIM: To clone the cDNA of UGT1A9 from a Chinese human liver and establish the Chinese hamster lung (CHL) cell line expressing human UGT1A9. METHODS: cDNA of UGT1 A9 was transcripted from mRNA by reverse transcriptase-ploymerase chain reaction, and was cloned into the pGEM-T vector which was amplified in the host bacteric E.Coli DH5(alpha). The inserted fragment, verified by DNA sequencing, was subcloned into the Hind III /Not I site of a mammalian expression vector pREP9 to construct the plasmid termed pREP9-UGT1A9. CHL cells were transfected with the resultant recombinants, pREP9-UGT1A9, and selected by G418 (400 mg x L(-1)) for one month. The surviving clone (CHL-UGT1A9) was harvested as a pool and sub-cultured in medium containing G418 to obtain samples forUGT1A9 assays. The enzyme activity of CHL-UGT1A9 towards propranolol in S9 protein of the cell was determined by HPLC. RESULTS: The sequence of the cDNA segment cloned, which was 1666 bp in length, was identical to that released by Gene Bank (GenBank accession number: AF056188) in coding region. The recombinant constructed, pREP9-UGT1A9, contains the entire coding region, along with 18 bp of the 5' and 55 bp of the 3' untranslated region of theUGT1A9 cDNA, respectively. The cell lines established expressed the protein of UGT1A9, and the enzyme activity towards propranolol in S9 protein was found to be 101+/- 24 pmol x min(-1) x mg(-1) protein (n=3), but was not detectable in parental CHL cells. CONCLUSION: The cDNA of UGT1A9 was successfully cloned from a Chinese human liver and transfected into CHL cells. The CHL-UGT1 A9 cell lines established efficiently expressed the protein ofUGT1A9 for the further enzyme study of drug glucuronidation.展开更多
Chronic myeloid leukemia(CML) is characterized by the accumulation of active BCR-ABL protein. Imatinib is the first-line treatment of CML; however, many patients are resistant to this drug. In this study, we aimed t...Chronic myeloid leukemia(CML) is characterized by the accumulation of active BCR-ABL protein. Imatinib is the first-line treatment of CML; however, many patients are resistant to this drug. In this study, we aimed to compare the differences in expression patterns and functions of time-series genes in imatinib-resistant CML cells under different drug treatments. GSE24946 was downloaded from the GEO database, which included 17 samples of K562-r cells with(n=12) or without drug administration(n=5). Three drug treatment groups were considered for this study: arsenic trioxide(ATO), AMN107, and ATO+AMN107. Each group had one sample at each time point(3, 12, 24, and 48 h). Time-series genes with a ratio of standard deviation/average(coefficient of variation) 〉0.15 were screened, and their expression patterns were revealed based on Short Time-series Expression Miner(STEM). Then, the functional enrichment analysis of time-series genes in each group was performed using DAVID, and the genes enriched in the top ten functional categories were extracted to detect their expression patterns. Different time-series genes were identified in the three groups, and most of them were enriched in the ribosome and oxidative phosphorylation pathways. Time-series genes in the three treatment groups had different expression patterns and functions. Time-series genes in the ATO group(e.g. CCNA2 and DAB2) were significantly associated with cell adhesion, those in the AMN107 group were related to cellular carbohydrate metabolic process, while those in the ATO+AMN107 group(e.g. AP2M1) were significantly related to cell proliferation and antigen processing. In imatinib-resistant CML cells, ATO could influence genes related to cell adhesion, AMN107 might affect genes involved in cellular carbohydrate metabolism, and the combination therapy might regulate genes involved in cell proliferation.展开更多
In order to study structure-function details of TGF-beta1, the recombinant mature form of rat TGF-beta1 was expressed in bacteria. Synthesis of the 112 amino-acid carboxyl-terminal part of TGF-beta1 (amino acid 279-39...In order to study structure-function details of TGF-beta1, the recombinant mature form of rat TGF-beta1 was expressed in bacteria. Synthesis of the 112 amino-acid carboxyl-terminal part of TGF-beta1 (amino acid 279-390) was controlled by an inducible gene expression system based on bacteriophage T7 RNA polymerase. This system allowed an active and selective synthesis of recombinant TGF-beta1. The molecular weight of expressed TGF-alpha1 monomer determined on SDS-polyacrylamide gel under reducing conditions was about 13 kD. Serial detergent washes combined with a single gel-filtration purification step were sufficient to purify the expression product to homogeneity. Amino-terminal sequencing revealed that the N-terminal of the recombinant protein was identical to the published data. In Western blot analysis the recombinant polypeptide showed excellent antigenicity against polyclonal TGF-beta1 antibody. The mature recombinant rat TGF-beta1 expressed in this study provides a useful tool for future detailed structural and functional studies.展开更多
We propose a new method for tumor classification from gene expression data, which mainly contains three steps. Firstly, the original DNA microarray gene expression data are modeled by independent component analysis (...We propose a new method for tumor classification from gene expression data, which mainly contains three steps. Firstly, the original DNA microarray gene expression data are modeled by independent component analysis (ICA). Secondly, the most discriminant eigenassays extracted by ICA are selected by the sequential floating forward selection technique. Finally, support vector machine is used to classify the modeling data. To show the validity of the proposed method, we applied it to classify three DNA microarray datasets involving various human normal and tumor tissue samples. The experimental results show that the method is efficient and feasible.展开更多
There have been many skewed cancer gene expression datasets in the post-genomic era. Extraction of differential expression genes or construction of decision rules using these skewed datasets by traditional algorithms ...There have been many skewed cancer gene expression datasets in the post-genomic era. Extraction of differential expression genes or construction of decision rules using these skewed datasets by traditional algorithms will seriously underestimate the performance of the minority class, leading to inaccurate diagnosis in clinical trails. This paper presents a skewed gene selection algorithm that introduces a weighted metric into the gene selection procedure. The extracted genes are paired as decision rules to distinguish both classes, with these decision rules then integrated into an ensemble learning framework by majority voting to recognize test examples; thus avoiding tedious data normalization and classifier construction. The mining and integrating of a few reliable decision rules gave higher or at least comparable classification performance than many traditional class imbalance learning algorithms on four benchmark imbalanced cancer gene expression datasets.展开更多
文摘Identification of differentially expressed genes (DEGs) in time course studies is very useful for understanding gene function, and can help determine key genes during specific stages of plant development. A few existing methods focus on the detection of DEGs within a single biological group, enabling to study temporal changes in gene expression. To utilize a rapidly increasing amount of single-group time-series expression data, we propose a two-step method that integrates the temporal characteristics of time-series data to obtain a B-spline curve fit. Firstly, a fiat gene filter based on the Ljung-Box test is used to filter out flat genes. Then, a B-spline model is used to identify DEGs. For use in biological experiments, these DEGs should be screened, to determine their biological importance. To identify high-confidence promising DEGs for specific biological processes, we propose a novel gene prioritization approach based on the partner evaluation principle. This novel gene prioritization ap- proach utilizes existing co-expression information to rank DEGs that are likely to be involved in a specific biological process/condition. The proposed method is validated on the Arabidopsis thaliana seed germination dataset and on the rice anther development expression dataset.
文摘Lung cancer remains a significant global health challenge and identifying lung cancer at an early stage is essential for enhancing patient outcomes. The study focuses on developing and optimizing gene expression-based models for classifying cancer types using machine learning techniques. By applying Log2 normalization to gene expression data and conducting Wilcoxon rank sum tests, the researchers employed various classifiers and Incremental Feature Selection (IFS) strategies. The study culminated in two optimized models using the XGBoost classifier, comprising 10 and 74 genes respectively. The 10-gene model, due to its simplicity, is proposed for easier clinical implementation, whereas the 74-gene model exhibited superior performance in terms of Specificity, AUC (Area Under the Curve), and Precision. These models were evaluated based on their sensitivity, AUC, and specificity, aiming to achieve high sensitivity and AUC while maintaining reasonable specificity.
基金Supported by The National Natural Science Foundation of China, No. 39870032Key Projects in the National Science & Technology Pillar Program in the Eleventh Five-Year Plan Period
文摘AIM: To extend the knowledge of the dynamic interaction between Helicobacter pylori (H. pylori) and host mucosa. METHODS: A time-series cDNA microarray was performed in order to detect the temporal gene expression prof iles of human gastric epithelial adenocarcinoma cells infected with H. pylori. Six time points were selected to observe the changes in the model. A differential expression prof ile at each time point was obtained by comparing the microarray signal value with that of 0 h. Real-time polymerase chain reaction was subsequently performed to evaluate the data quality. RESULTS: We found a diversity of gene expression patterns at different time points and identifi ed a group of genes whose expression levels were significantly correlated with several important immune response and tumor related pathways. CONCLUSION: Early infection may trigger some important pathways and may impact the outcome of the infection.
文摘Accurate gas viscosity determination is an important issue in the oil and gas industries.Experimental approaches for gas viscosity measurement are timeconsuming,expensive and hardly possible at high pressures and high temperatures(HPHT).In this study,a number of correlations were developed to estimate gas viscosity by the use of group method of data handling(GMDH)type neural network and gene expression programming(GEP)techniques using a large data set containing more than 3000 experimental data points for methane,nitrogen,and hydrocarbon gas mixtures.It is worth mentioning that unlike many of viscosity correlations,the proposed ones in this study could compute gas viscosity at pressures ranging between 34 and 172 MPa and temperatures between 310 and 1300 K.Also,a comparison was performed between the results of these established models and the results of ten wellknown models reported in the literature.Average absolute relative errors of GMDH models were obtained 4.23%,0.64%,and 0.61%for hydrocarbon gas mixtures,methane,and nitrogen,respectively.In addition,graphical analyses indicate that the GMDH can predict gas viscosity with higher accuracy than GEP at HPHT conditions.Also,using leverage technique,valid,suspected and outlier data points were determined.Finally,trends of gas viscosity models at different conditions were evaluated.
文摘In this paper, a similarity measure between genes with protein-protein interactions is pro-posed. The chip-chip data are converted into the same form of gene expression data with pear-son correlation as its similarity measure. On the basis of the similarity measures of protein- protein interaction data and chip-chip data, the combined dissimilarity measure is defined. The combined distance measure is introduced into K-means method, which can be considered as an improved K-means method. The improved K-means method and other three clustering methods are evaluated by a real dataset. Per-formance of these methods is assessed by a prediction accuracy analysis through known gene annotations. Our results show that the improved K-means method outperforms other clustering methods. The performance of the improved K-means method is also tested by varying the tuning coefficients of the combined dissimilarity measure. The results show that it is very helpful and meaningful to incorporate het-erogeneous data sources in clustering gene expression data, and those coefficients for the genome-wide or completed data sources should be given larger values when constructing the combined dissimilarity measure.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under grant number(RGP 2/42/43)This work was supported by Taif University Researchers Supporting Program(project number:TURSP-2020/200),Taif University,Saudi Arabia.
文摘In bioinformatics applications,examination of microarray data has received significant interest to diagnose diseases.Microarray gene expression data can be defined by a massive searching space that poses a primary challenge in the appropriate selection of genes.Microarray data classification incorporates multiple disciplines such as bioinformatics,machine learning(ML),data science,and pattern classification.This paper designs an optimal deep neural network based microarray gene expression classification(ODNN-MGEC)model for bioinformatics applications.The proposed ODNN-MGEC technique performs data normalization process to normalize the data into a uniform scale.Besides,improved fruit fly optimization(IFFO)based feature selection technique is used to reduce the high dimensionality in the biomedical data.Moreover,deep neural network(DNN)model is applied for the classification of microarray gene expression data and the hyperparameter tuning of the DNN model is carried out using the Symbiotic Organisms Search(SOS)algorithm.The utilization of IFFO and SOS algorithms pave the way for accomplishing maximum gene expression classification outcomes.For examining the improved outcomes of the ODNN-MGEC technique,a wide ranging experimental analysis is made against benchmark datasets.The extensive comparison study with recent approaches demonstrates the enhanced outcomes of the ODNN-MGEC technique in terms of different measures.
文摘Gene expression data represents a condition matrix where each rowrepresents the gene and the column shows the condition. Micro array used todetect gene expression in lab for thousands of gene at a time. Genes encode proteins which in turn will dictate the cell function. The production of messengerRNA along with processing the same are the two main stages involved in the process of gene expression. The biological networks complexity added with thevolume of data containing imprecision and outliers increases the challenges indealing with them. Clustering methods are hence essential to identify the patternspresent in massive gene data. Many techniques involve hierarchical, partitioning,grid based, density based, model based and soft clustering approaches for dealingwith the gene expression data. Understanding the gene regulation and other usefulinformation from this data can be possible only through effective clustering algorithms. Though many methods are discussed in the literature, we concentrate onproviding a soft clustering approach for analyzing the gene expression data. Thepopulation elements are grouped based on the fuzziness principle and a degree ofmembership is assigned to all the elements. An improved Fuzzy clustering byLocal Approximation of Memberships (FLAME) is proposed in this workwhich overcomes the limitations of the other approaches while dealing with thenon-linear relationships and provide better segregation of biological functions.
文摘The analysis of messenger Ribonucleic acid obtained through sequencing techniques (RNA-se- quencing) data is very challenging. Once technical difficulties have been sorted, an important choice has to be made during pre-processing: Two different paths can be chosen: Transform RNA- sequencing count data to a continuous variable or continue to work with count data. For each data type, analysis tools have been developed and seem appropriate at first sight, but a deeper analysis of data distribution and structure, are a discussion worth. In this review, open questions regarding RNA-sequencing data nature are discussed and highlighted, indicating important future research topics in statistics that should be addressed for a better analysis of already available and new appearing gene expression data. Moreover, a comparative analysis of RNAseq count and transformed data is presented. This comparison indicates that transforming RNA-seq count data seems appropriate, at least for differential expression detection.
基金Supported by the National Natural Science Foundation of China No.30000077Science Funds for Post-doctoral Studies(1999[10])Medicial and Health Project Funds of Chinese PLA Lanzhou Command(LXH01-01)
文摘AIM: To study and clone a novel liver cancer related gene, and to explore the molecular basis of liver cancer genesis. METHODS: Using mRNA differential display polymerase chain reaction (DDPCR), we investigated the difference of mRNA in human hepatocellular carcinoma (HCC) and paired surrounding liver tissues, and got a gene probe. By screening a human placenta cDNA library and genomic homologous extend, we obtained a full-length cDNA named HCCA3. We analyzed the expression of this novel gene in 42 pairs of HCC and the surrounding liver tissues, and distribution in human normal tissues by means of Northern blot assay. RESULTS: A full-length cDNA of liver cancer associated gene HCCA3 has been submitted to the GeneBank nucleotide sequence databases (Accession No. AF276707). The positive expression rate of this gene was 78.6% (33/42) in HCC tissues, and the clinical pathological data showed that the HCCA3 was closely associated with the invasion of tumor capsule (P=0.023) and adjacant small metastasis satellite nodules lesions (P=0.041). The HCCA3 was widely distributed in the human normal tissues, which was intensively expressed in lungs, brain and colon tissues, while lowly expressed in the liver tissues. CONCLUSION: A novel full-length cDNA was cloned and differentiated, which was highly expressed in liver cancer tissues. The high expression was closely related to the tumor invasiveness and metastasis,that may be the late heredited change in HCC genesis.
文摘AIM: To investigate SBA2 expression in CRC cell lines and surgical specimens of CRC and autologous healthy mucosa. METHODS: Reverse transcription-polymerase chain reaction (RT-PCR) was used for relative quantification of SBA2 mRNA levels in 4 human CRC cell lines with different grades of differentiation and 30 clinical samples. Normalization of the results was achieved by simultaneous amplification of beta-actin as an internal control. RESULTS: In the exponential range of amplification, fairly good linearity demonstrated identical amplification efficiency for SBA2 and beta-actin (82%). Markedly lower levels of SBA2 mRNA were detectable in tumors, as compared with the coupled normal counterparts P【0.01). SBA2 expression was significantly (0.01】P 【 0.05) correlated with the grade of differentiation in CRC, with relatively higher levels in well-differentiated samples and lower in poorly-differentiated cases. Of the 9 cases with lymph nodes affected, 78% (7/9) had reduced SBA2 mRNA expression in contrast to 24% (5/21) in non-metastasis samples 0.01】P【0.05). CONCLUSION: SBA2 gene might be a promising novel biomarker of cell differentiation in colorectal cancer and its biological features need further studies.
文摘Bicoid is one of the important Drosophila maternal genes involved in the control of embryo polarity and larvae segmentation. To clone and characterize the rice bicoid-related genes, one cDNA clone, Rb24 (EMBL accession number: AJ2771380), was isolated by screening of rice unmature seed cDNA library. Sequence analysis indicates that Rb24 contains a putative amino acid sequence, which is homologous to unique 8 amino acids sequence within Drosophila bicoid homeodomain (50% identity, 75% similarity) and involves a lys-9 in putative helix 3. Northern blot analysis of rice RNA has shown that this sequence is expressed in a tissue-specific manner. The transcript was detected strongly in young panicles, but less in young leaves and roots. This results are further confirmed with paraffin section in situ hybridization. The signal is intensive in rice globular embryo and located at the apical tip of the embryo, then, along with the development of embryo, the signal is getting reduced and transfers into both sides of embryo. The existence of bicoid-related sequence in rice embryo and the similarity of polar distribution of bicoid and Rb24 mRNA in early embryo development may implicates a conserved maternal regulation mechanism of body axis presents in Drosophila and in rice.
基金Supported by the National Natural Science Foundation of China(No.81271019No.61463046)Gansu Province Science Foundation for Youths(No.145RJYA282)
文摘AIM: To identify and understand the relationship between co-expression pattern and clinic traits in uveal melanoma, weighted gene co-expression network analysis(WGCNA) is applied to investigate the gene expression levels and patient clinic features. Uveal melanoma is the most common primary eye tumor in adults. Although many studies have identified some important genes and pathways that were relevant to progress of uveal melanoma, the relationship between co-expression and clinic traits in systems level of uveal melanoma is unclear yet. We employ WGCNA to investigate the relationship underlying molecular and phenotype in this study.METHODS: Gene expression profile of uveal melanoma and patient clinic traits were collected from the Gene Expression Omnibus(GEO) database. The gene co-expression is calculated by WGCNA that is the R package software. The package is used to analyze the correlation between pairs of expression levels of genes.The function of the genes were annotated by gene ontology(GO).RESULTS: In this study, we identified four co-expression modules significantly correlated with clinictraits. Module blue positively correlated with radiotherapy treatment. Module purple positively correlates with tumor location(sclera) and negatively correlates with patient age. Module red positively correlates with sclera and negatively correlates with thickness of tumor. Module black positively correlates with the largest tumor diameter(LTD). Additionally, we identified the hug gene(top connectivity with other genes) in each module. The hub gene RPS15 A, PTGDS, CD53 and MSI2 might play a vital role in progress of uveal melanoma.CONCLUSION: From WGCNA analysis and hub gene calculation, we identified RPS15 A, PTGDS, CD53 and MSI2 might be target or diagnosis for uveal melanoma.
文摘Acute leukemia is an aggressive disease that has high mortality rates worldwide.The error rate can be as high as 40%when classifying acute leukemia into its subtypes.So,there is an urgent need to support hematologists during the classification process.More than two decades ago,researchers used microarray gene expression data to classify cancer and adopted acute leukemia as a test case.The high classification accuracy they achieved confirmed that it is possible to classify cancer subtypes using microarray gene expression data.Ensemble machine learning is an effective method that combines individual classifiers to classify new samples.Ensemble classifiers are recognized as powerful algorithms with numerous advantages over traditional classifiers.Over the past few decades,researchers have focused a great deal of attention on ensemble classifiers in a wide variety of fields,including but not limited to disease diagnosis,finance,bioinformatics,healthcare,manufacturing,and geography.This paper reviews the recent ensemble classifier approaches utilized for acute leukemia gene expression data classification.Moreover,a framework for classifying acute leukemia gene expression data is proposed.The pairwise correlation gene selection method and the Rotation Forest of Bayesian Networks are both used in this framework.Experimental outcomes show that the classification accuracy achieved by the acute leukemia ensemble classifiers constructed according to the suggested framework is good compared to the classification accuracy achieved in other studies.
基金Supported by the National Natural Science Foundation of China(C39370805)Zhejiang Provincial Natural Science Foundation(300487)the Excellent Youth Scientist Fund of Zhejiang Province
文摘AIM: To clone the cDNA of UGT1A9 from a Chinese human liver and establish the Chinese hamster lung (CHL) cell line expressing human UGT1A9. METHODS: cDNA of UGT1 A9 was transcripted from mRNA by reverse transcriptase-ploymerase chain reaction, and was cloned into the pGEM-T vector which was amplified in the host bacteric E.Coli DH5(alpha). The inserted fragment, verified by DNA sequencing, was subcloned into the Hind III /Not I site of a mammalian expression vector pREP9 to construct the plasmid termed pREP9-UGT1A9. CHL cells were transfected with the resultant recombinants, pREP9-UGT1A9, and selected by G418 (400 mg x L(-1)) for one month. The surviving clone (CHL-UGT1A9) was harvested as a pool and sub-cultured in medium containing G418 to obtain samples forUGT1A9 assays. The enzyme activity of CHL-UGT1A9 towards propranolol in S9 protein of the cell was determined by HPLC. RESULTS: The sequence of the cDNA segment cloned, which was 1666 bp in length, was identical to that released by Gene Bank (GenBank accession number: AF056188) in coding region. The recombinant constructed, pREP9-UGT1A9, contains the entire coding region, along with 18 bp of the 5' and 55 bp of the 3' untranslated region of theUGT1A9 cDNA, respectively. The cell lines established expressed the protein of UGT1A9, and the enzyme activity towards propranolol in S9 protein was found to be 101+/- 24 pmol x min(-1) x mg(-1) protein (n=3), but was not detectable in parental CHL cells. CONCLUSION: The cDNA of UGT1A9 was successfully cloned from a Chinese human liver and transfected into CHL cells. The CHL-UGT1 A9 cell lines established efficiently expressed the protein ofUGT1A9 for the further enzyme study of drug glucuronidation.
基金supported by Natural Science Foundation of Heilongjiang Province of China(No.D201252)
文摘Chronic myeloid leukemia(CML) is characterized by the accumulation of active BCR-ABL protein. Imatinib is the first-line treatment of CML; however, many patients are resistant to this drug. In this study, we aimed to compare the differences in expression patterns and functions of time-series genes in imatinib-resistant CML cells under different drug treatments. GSE24946 was downloaded from the GEO database, which included 17 samples of K562-r cells with(n=12) or without drug administration(n=5). Three drug treatment groups were considered for this study: arsenic trioxide(ATO), AMN107, and ATO+AMN107. Each group had one sample at each time point(3, 12, 24, and 48 h). Time-series genes with a ratio of standard deviation/average(coefficient of variation) 〉0.15 were screened, and their expression patterns were revealed based on Short Time-series Expression Miner(STEM). Then, the functional enrichment analysis of time-series genes in each group was performed using DAVID, and the genes enriched in the top ten functional categories were extracted to detect their expression patterns. Different time-series genes were identified in the three groups, and most of them were enriched in the ribosome and oxidative phosphorylation pathways. Time-series genes in the three treatment groups had different expression patterns and functions. Time-series genes in the ATO group(e.g. CCNA2 and DAB2) were significantly associated with cell adhesion, those in the AMN107 group were related to cellular carbohydrate metabolic process, while those in the ATO+AMN107 group(e.g. AP2M1) were significantly related to cell proliferation and antigen processing. In imatinib-resistant CML cells, ATO could influence genes related to cell adhesion, AMN107 might affect genes involved in cellular carbohydrate metabolism, and the combination therapy might regulate genes involved in cell proliferation.
基金Shanghai Medical Development grant No. ZD99001 and aGrant (SFB-542) from the Deutsche Forschungsgemeinschaft.
文摘In order to study structure-function details of TGF-beta1, the recombinant mature form of rat TGF-beta1 was expressed in bacteria. Synthesis of the 112 amino-acid carboxyl-terminal part of TGF-beta1 (amino acid 279-390) was controlled by an inducible gene expression system based on bacteriophage T7 RNA polymerase. This system allowed an active and selective synthesis of recombinant TGF-beta1. The molecular weight of expressed TGF-alpha1 monomer determined on SDS-polyacrylamide gel under reducing conditions was about 13 kD. Serial detergent washes combined with a single gel-filtration purification step were sufficient to purify the expression product to homogeneity. Amino-terminal sequencing revealed that the N-terminal of the recombinant protein was identical to the published data. In Western blot analysis the recombinant polypeptide showed excellent antigenicity against polyclonal TGF-beta1 antibody. The mature recombinant rat TGF-beta1 expressed in this study provides a useful tool for future detailed structural and functional studies.
基金the National Natural Sci-ence Foundation of China (No. 30700161)the Na-tional High-Tech Research and Development Program(863 Program) of China (No. 2007AA01Z167 and2006AA02Z309)+1 种基金China Postdoctoral Science Foun-dation (No. 20070410223)Doctor Scientific Re-search Startup Foundation of Qufu Normal University(No. Bsqd2007036).
文摘We propose a new method for tumor classification from gene expression data, which mainly contains three steps. Firstly, the original DNA microarray gene expression data are modeled by independent component analysis (ICA). Secondly, the most discriminant eigenassays extracted by ICA are selected by the sequential floating forward selection technique. Finally, support vector machine is used to classify the modeling data. To show the validity of the proposed method, we applied it to classify three DNA microarray datasets involving various human normal and tumor tissue samples. The experimental results show that the method is efficient and feasible.
基金Supported by the National Natural Science Foundation of China (No.61105057)the Ph.D Foundation of Jiangsu University of Science and Technology (Nos.35301002 and 35211104)
文摘There have been many skewed cancer gene expression datasets in the post-genomic era. Extraction of differential expression genes or construction of decision rules using these skewed datasets by traditional algorithms will seriously underestimate the performance of the minority class, leading to inaccurate diagnosis in clinical trails. This paper presents a skewed gene selection algorithm that introduces a weighted metric into the gene selection procedure. The extracted genes are paired as decision rules to distinguish both classes, with these decision rules then integrated into an ensemble learning framework by majority voting to recognize test examples; thus avoiding tedious data normalization and classifier construction. The mining and integrating of a few reliable decision rules gave higher or at least comparable classification performance than many traditional class imbalance learning algorithms on four benchmark imbalanced cancer gene expression datasets.