期刊文献+
共找到27篇文章
< 1 2 >
每页显示 20 50 100
Hybrid Gene Selection Methods for High-Dimensional Lung Cancer Data Using Improved Arithmetic Optimization Algorithm
1
作者 Mutasem K.Alsmadi 《Computers, Materials & Continua》 SCIE EI 2024年第6期5175-5200,共26页
Lung cancer is among the most frequent cancers in the world,with over one million deaths per year.Classification is required for lung cancer diagnosis and therapy to be effective,accurate,and reliable.Gene expression ... Lung cancer is among the most frequent cancers in the world,with over one million deaths per year.Classification is required for lung cancer diagnosis and therapy to be effective,accurate,and reliable.Gene expression microarrays have made it possible to find genetic biomarkers for cancer diagnosis and prediction in a high-throughput manner.Machine Learning(ML)has been widely used to diagnose and classify lung cancer where the performance of ML methods is evaluated to identify the appropriate technique.Identifying and selecting the gene expression patterns can help in lung cancer diagnoses and classification.Normally,microarrays include several genes and may cause confusion or false prediction.Therefore,the Arithmetic Optimization Algorithm(AOA)is used to identify the optimal gene subset to reduce the number of selected genes.Which can allow the classifiers to yield the best performance for lung cancer classification.In addition,we proposed a modified version of AOA which can work effectively on the high dimensional dataset.In the modified AOA,the features are ranked by their weights and are used to initialize the AOA population.The exploitation process of AOA is then enhanced by developing a local search algorithm based on two neighborhood strategies.Finally,the efficiency of the proposed methods was evaluated on gene expression datasets related to Lung cancer using stratified 4-fold cross-validation.The method’s efficacy in selecting the optimal gene subset is underscored by its ability to maintain feature proportions between 10%to 25%.Moreover,the approach significantly enhances lung cancer prediction accuracy.For instance,Lung_Harvard1 achieved an accuracy of 97.5%,Lung_Harvard2 and Lung_Michigan datasets both achieved 100%,Lung_Adenocarcinoma obtained an accuracy of 88.2%,and Lung_Ontario achieved an accuracy of 87.5%.In conclusion,the results indicate the potential promise of the proposed modified AOA approach in classifying microarray cancer data. 展开更多
关键词 Lung cancer gene selection improved arithmetic optimization algorithm and machine learning
下载PDF
An Intelligent Hybrid Ensemble Gene Selection Model for Autism Using DNN
2
作者 G.Anurekha P.Geetha 《Intelligent Automation & Soft Computing》 SCIE 2023年第3期3049-3064,共16页
Autism Spectrum Disorder(ASD)is a complicated neurodevelopmen-tal disorder that is often identified in toddlers.The microarray data is used as a diagnostic tool to identify the genetics of the disorder.However,microarr... Autism Spectrum Disorder(ASD)is a complicated neurodevelopmen-tal disorder that is often identified in toddlers.The microarray data is used as a diagnostic tool to identify the genetics of the disorder.However,microarray data is large and has a high volume.Consequently,it suffers from the problem of dimensionality.In microarray data,the sample size and variance of the gene expression will lead to overfitting and misclassification.Identifying the autism gene(feature)subset from microarray data is an important and challenging research area.It has to be efficiently addressed to improve gene feature selection and classification.To overcome the challenges,a novel Intelligent Hybrid Ensem-ble Gene Selection(IHEGS)model is proposed in this paper.The proposed model integrates the intelligence of different feature selection techniques over the data partitions.In this model,the initial gene selection is carried out by data perturba-tion,and thefinal autism gene subset is obtained by functional perturbation,which reduces the problem of dimensionality in microarray data.The functional perturbation module employs three meta-heuristic swarm intelligence-based tech-niques for gene selection.The obtained gene subset is validated by the Deep Neural Network(DNN)model.The proposed model is implemented using python with six National Center for Biotechnology Information(NCBI)gene expression datasets.From the comparative study with other existing state-of-the-art systems,the proposed model provides stable results in terms of feature selection and clas-sification accuracy. 展开更多
关键词 Autism spectrum disorder feature selection ensemble gene selection MICROARRAY gene expression deep neural network META-HEURISTIC
下载PDF
Hybrid Feature Selection Method for Predicting Alzheimer’s Disease Using Gene Expression Data
3
作者 Aliaa El-Gawady BenBella S.Tawfik Mohamed A.Makhlouf 《Computers, Materials & Continua》 SCIE EI 2023年第3期5559-5572,共14页
Gene expression(GE)classification is a research trend as it has been used to diagnose and prognosis many diseases.Employing machine learning(ML)in the prediction of many diseases based on GE data has been a flourishin... Gene expression(GE)classification is a research trend as it has been used to diagnose and prognosis many diseases.Employing machine learning(ML)in the prediction of many diseases based on GE data has been a flourishing research area.However,some diseases,like Alzheimer’s disease(AD),have not received considerable attention,probably owing to data scarcity obstacles.In this work,we shed light on the prediction of AD from GE data accurately using ML.Our approach consists of four phases:preprocessing,gene selection(GS),classification,and performance validation.In the preprocessing phase,gene columns are preprocessed identically.In the GS phase,a hybrid filtering method and embedded method are used.In the classification phase,three ML models are implemented using the bare minimum of the chosen genes obtained from the previous phase.The final phase is to validate the performance of these classifiers using different metrics.The crux of this article is to select the most informative genes from the hybrid method,and the best ML technique to predict AD using this minimal set of genes.Five different datasets are used to achieve our goal.We predict AD with impressive values forMultiLayer Perceptron(MLP)classifier which has the best performance metrics in four datasets,and the Support Vector Machine(SVM)achieves the highest performance values in only one dataset.We assessed the classifiers using sevenmetrics;and received impressive results,allowing for a credible performance rating.The metrics values we obtain in our study lie in the range[.97,.99]for the accuracy(Acc),[.97,.99]for F1-score,[.94,.98]for kappa index,[.97,.99]for area under curve(AUC),[.95,1]for precision,[.98,.99]for sensitivity(recall),and[.98,1]for specificity.With these results,the proposed approach outperforms recent interesting results.With these results,the proposed approach outperforms recent interesting results. 展开更多
关键词 gene expression gene selection machine learning CLASSIFICATION Alzheimer’s disease
下载PDF
Genetic mechanism of body size variation in groupers:Insights from phylotranscriptomics
4
作者 Wei-Wei Zhang Zhuo-Ying Weng +5 位作者 Xi Wang Yang Yang Duo Li Le Wang Xiao-Chun Liu Zi-Ning Meng 《Zoological Research》 SCIE CSCD 2024年第2期314-328,共15页
Animal body size variation is of particular interest in evolutionary biology,but the genetic basis remains largely unknown.Previous studies have shown the presence of two parallel evolutionary genetic clusters within ... Animal body size variation is of particular interest in evolutionary biology,but the genetic basis remains largely unknown.Previous studies have shown the presence of two parallel evolutionary genetic clusters within the fish genus Epinephelus with evident divergence in body size,providing an excellent opportunity to investigate the genetic basis of body size variation in vertebrates.Herein,we performed phylotranscriptomic analysis and reconstructed the phylogeny of 13 epinephelids originating from the South China Sea.Two genetic clades with an estimated divergence time of approximately 15.4 million years ago were correlated with large and small body size,respectively.A total of 180 rapidly evolving genes and two positively selected genes were identified between the two groups.Functional enrichment analyses of these candidate genes revealed distinct enrichment categories between the two groups.These pathways and genes may play important roles in body size variation in groupers through complex regulatory networks.Based on our results,we speculate that the ancestors of the two divergent groups of groupers may have adapted to different environments through habitat selection,leading to genetic variations in metabolic patterns,organ development,and lifespan,resulting in body size divergence between the two locally adapted populations.These findings provide important insights into the genetic mechanisms underlying body size variation in groupers and species differentiation. 展开更多
关键词 Phylotranscriptomics GROUPER Body size Rapidly evolving genes(REGs) Positively selected genes(PSGs)
下载PDF
A Modified Ant Colony Optimization Algorithm for Tumor Marker Gene Selection 被引量:7
5
作者 Hualong Yu Guochang Gu Haibo Liu Jing Shen Jing Zhao 《Genomics, Proteomics & Bioinformatics》 SCIE CAS CSCD 2009年第4期200-208,共9页
Microarray data are often extremely asymmetric in dimensionality, such as thousands or even tens of thousands of genes but only a few hundreds of samples or less. Such extreme asymmetry between the dimensionality of g... Microarray data are often extremely asymmetric in dimensionality, such as thousands or even tens of thousands of genes but only a few hundreds of samples or less. Such extreme asymmetry between the dimensionality of genes and samples can lead to inaccurate diagnosis of disease in clinic. Therefore, it has been shown that selecting a small set of marker genes can lead to improved classification accuracy. In this paper, a simple modified ant colony optimization (ACO) algorithm is proposed to select tumorelated marker genes, and support vector machine (SVM) is used as classifier to evaluate the performance of the extracted gene subset. Experimental results on several benchmark tumor microarray datasets showed that the proposed approach produces better recognition with fewer marker genes than many other methods. It has been demonstrated that the modified ACO is a useful tool for selecting marker genes and mining high dimension data 展开更多
关键词 microarray data ant colony optimization marker gene selection support vector machine
原文传递
Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classi?cation 被引量:5
6
作者 Lingyun Gao Mingquan Ye +1 位作者 Xiaojie Lu Daobin Huang 《Genomics, Proteomics & Bioinformatics》 SCIE CAS CSCD 2017年第6期389-395,共7页
It remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a ... It remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine (IG-SVM) in this study. IG was initially employed to filter irrelevant and redundant genes. Then, further removal of redundant genes was performed using SVM to eliminate the noise in the datasets more effectively. Finally, the informative genes selected by IG-SVM served as the input for the LIBSVM classifier. Compared to other related algorithms, IG-SVM showed the highest classification accuracy and superior performance as evaluated using five cancer gene expression datasets based on a few selected genes. As an example, IG-SVM achieved a classification accuracy of 90.32% for colon cancer, which is difficult to be accurately classified, only based on three genes including CSRP1, MYLg, and GUCA2B. 展开更多
关键词 gene selection Cancer classification Information gain Support vector machine Small sample size with highdimension
原文传递
Gene selection in class space for molecular classification of cancer 被引量:3
7
作者 ZHANGJunying YueJosephWANG +1 位作者 JavedKHAN RobertCLARKE 《Science in China(Series F)》 2004年第3期301-314,共14页
Gene selection (feature selection) is generally pertormed in gene space(feature space), where a very serious curse of dimensionality problem always existsbecause the number of genes is much larger than the number of s... Gene selection (feature selection) is generally pertormed in gene space(feature space), where a very serious curse of dimensionality problem always existsbecause the number of genes is much larger than the number of samples in gene space(G-space). This results in difficulty in modeling the data set in this space and the lowconfidence of the result of gene selection. How to find a gene subset in this case is achallenging subject. In this paper, the above G-space is transformed into its dual space,referred to as class space (C-space) such that the number of dimensions is the verynumber of classes of the samples in G-space and the number of samples in C-space isthe number of genes in G-space. it is obvious that the curse of dimensionality in C-spacedoes not exist. A new gene selection method which is based on the principle of separatingdifferent classes as far as possible is presented with the help of Principal ComponentAnalysis (PCA). The experimental results on gene selection for real data set areevaluated with Fisher criterion, weighted Fisher criterion as well as leave-one-out crossvalidation, showing that the method presented here is effective and efficient. 展开更多
关键词 feature space (gene space) class space feature selection (gene selection) PCA
原文传递
Gene Selection for Classifications Using Multiple PCA with Sparsity
8
作者 Yanwei Huang Liqing Zhang 《Tsinghua Science and Technology》 SCIE EI CAS 2012年第6期659-665,共7页
A gene selection algorithm was developed using Multiple Principal Component Analysis with Sparsity (MSPCA). The MSPCA algorithm is used to analyze normal and disease gene expression samples and to set these componen... A gene selection algorithm was developed using Multiple Principal Component Analysis with Sparsity (MSPCA). The MSPCA algorithm is used to analyze normal and disease gene expression samples and to set these component Ioadings to zero if they are smaller than a threshold for sparse solutions. Next, genes with zero Ioadings across all samples (both normal and disease) are removed before extracting feature genes. Feature genes are genes that contribute differentially to variations in normal and disease samples and, thus, can be used for classification. The MSPCA is applied to three microarray datasets to select feature genes with a linear support vector machine to evaluate its performance. This method is compared with several previous gene selection results to show that this MSPCA gene selection algorithm has good classification accuracy and model stability. 展开更多
关键词 microarray gene expression gene selection Multiple Principal Component Analysis with Sparsity (MSPCA) sparse
原文传递
Identification of metastasis-associated genes in colorectal cancer through an integrated genomic and transcriptomic analysis 被引量:2
9
作者 Xiaobo Li Sihua Peng 《Chinese Journal of Cancer Research》 SCIE CAS CSCD 2013年第6期623-636,共14页
Objective: Identification of colorectal cancer (CRC) metastasis genes is one of the most important issues in CRC research. For the purpose of mining CRC metastasis-associated genes, an integrated analysis of mJcroa... Objective: Identification of colorectal cancer (CRC) metastasis genes is one of the most important issues in CRC research. For the purpose of mining CRC metastasis-associated genes, an integrated analysis of mJcroarray data was presented, by combined with evidence acquired from comparative genornic hybridization (CGH) data. Methods: Gene expression profile data of CRC samples were obtained at Gene Expression Omnibus (GEO) website. The 15 important chromosomal aberration sites detected by using CGH technology were used for integrated genomic and transcriptomic analysis. Significant Analysis of Microarray (SAM) was used to detect significantly differentially expressed genes across the whole genome. The overlapping genes were selected in their corresponding chromosomal aberration regions, and analyzed by using the Database for Annotation, Visualization and Integrated Discovery (DAVID). Finally, SVM-T-RFE gene selection algorithm was applied to identify ted genes in CRC. Results: A minimum gene set was obtained with the minimum number [14] of genes, and the highest classification accuracy (100%) in both PRI and META datasets. A fraction of selected genes are associated with CRC or its metastasis. Conclusions- Our results demonstrated that integration analysis is an effective strategy for mining cancer- associated genes. 展开更多
关键词 Colorectal cancer metastasis integrated analysis comparative genomic hybridization (CGH) Significant Analysis of Microarray (SAM) Database for Annotation Visualization and Integrated Discovery(DAVID) SVM-T-RFE gene selection algorithm
下载PDF
Determination of Internal Controls for Quantitative Gene Expression of Isochrysis zhangjiangensis at Nitrogen Stress Condition 被引量:1
10
作者 WU Shuang ZHOU Jiannan +1 位作者 CAO Xupeng XUE Song 《Journal of Ocean University of China》 SCIE CAS 2016年第1期137-144,共8页
lsochrysis zhangfiangensis is a potential marine microalga for biodiesel production, which accumulates lipid under ni- trogen limitation conditions, but the mechanism on molecular level is veiled. Quantitative real-ti... lsochrysis zhangfiangensis is a potential marine microalga for biodiesel production, which accumulates lipid under ni- trogen limitation conditions, but the mechanism on molecular level is veiled. Quantitative real-time polymerase chain reaction (qPCR) provides the possibility to investigate the gene expression levels, and a valid reference for data normalization is an essential prerequisite for firing up the analysis. In this study, five housekeeping genes, actin (ACT), α-tubulin (TUA), β-tubulin (TUB), ubiquitin (UBI), 18S rRNA (18S) and one target gene, diacylglycerol acyltransferase (DGAT), were used for determining the reference. By analyzing the stabilities based on calculation of the stability index and on operating the two types of software, geNorm and bestkeeper, it showed that the reference genes widely used in higher plant and microalgae, such as UBI, TUA and 18S, were not the most stable ones in nitrogen-stressed 1. zhangjiangensis, and thus are not suitable for exploring the mRNA expression levels under these experi- mental conditions. Our results show that ACT together with TUB is the most feasible internal control for investigating gene expres- sion under nitrogen-stressed conditions. Our findings will contribute not only to future qPCR studies of/. zhangfiangensis, but also to verification of comparative transcriptomics studies of the microalgae under similar conditions. 展开更多
关键词 CHRYSOPHYTA Isochrysis zhang/iangensis nitrogen deficiency lipid accumulation reference gene selection
下载PDF
Positively selected genes of the Chinese tree shrew (Tupaia belangeri chinensis) locomotion system 被引量:2
11
作者 Yu FAN Dan-Dan YU Yong-Gang YAO 《Zoological Research》 CAS CSCD 北大核心 2014年第3期240-248,共9页
While the recent release of the Chinese tree shrew (Tupaia belangeri chinensis) genome has made the tree shrew an increasingly viable experimental animal model for biomedical research, further study of the genome ma... While the recent release of the Chinese tree shrew (Tupaia belangeri chinensis) genome has made the tree shrew an increasingly viable experimental animal model for biomedical research, further study of the genome may facilitate new insights into the applicability of this model. For example, though the tree shrew has a rapid rate of speed and strong jumping ability, there are limited studies on its locomotion ability. In this study we used the available Chinese tree shrew genome information and compared the evolutionary pattern of 407 locomotion system related orthologs among five mammals (human, rhesus monkey, mouse, rat and dog) and the Chinese tree shrew. Our analyses identified 29 genes with significantly high co (Ka/Ks ratio) values and 48 amino acid sites in 14 genes showed significant evidence of positive selection in the Chinese tree shrew. Some of these positively selected genes, e.g. HOXA6 (homeobox A6) and AVP (arginine vasopressin), play important roles in muscle contraction or skeletal morphogenesis. These results provide important clues in understanding the genetic bases of locomotor adaptation in the Chinese tree shrew. 展开更多
关键词 Chinese tree shrew Locomotion system Positively selected genes
下载PDF
Pyramiding of Pi46 and Pita to improve blast resistance and to evaluate the resistance effect of the two R genes 被引量:6
12
作者 XIAO Wu-ming LUO Li-xin +6 位作者 WANG Hui GUO Tao LIU Yong-zhu ZHOU Ji-yong ZHU Xiao-yuan YANG Qi-yun CHEN Zhi-qiang 《Journal of Integrative Agriculture》 SCIE CAS CSCD 2016年第10期2290-2298,共9页
Utilization of R(resistance) genes to develop resistant cultivars is an effective strategy to combat against rice blast disease. In this study, R genes Pi46 and Pita in a resistant accession H4 were introgressed int... Utilization of R(resistance) genes to develop resistant cultivars is an effective strategy to combat against rice blast disease. In this study, R genes Pi46 and Pita in a resistant accession H4 were introgressed into an elite restorer line Hang-Hui-179(HH179) using the marker-assisted backcross breeding(MABB) procedure. As a result, three improved lines(e.g., R1791 carrying Pi46 alone, R1792 carrying Pita alone and R1793 carrying both Pi46 and Pita) were developed. The three improved lines had significant genetic similarities with the recurrent parent HH179. Thus, they and HH179 could be recognized as near isogenic lines(NILs). The resistance spectrum of the three improved lines, which was tested at seedling stage, reached 91.1, 64.7 and 97.1%, respectively. This was markedly broader than that of HH179(23.5%). Interestingly, R1793 showed resistance to panicle blast but neither R1791 nor R1792 exhibited resistance at two natural blast nurseries. The results implied that the stacking of Pi46 and Pita resulted in enhanced resistance, which was unachievable by either R gene alone. Further comparison indicated that the three improved lines were similar to HH179 in multiple agronomic traits; including plant height, tillers per plant, panicle length, spikelet fertility, and 1 000-grain weight. Thus, the three improved lines with different R genes can be used as new sources of resistance for developing variety. There is a complementary effect between the two R genes Pi46 and Pita. 展开更多
关键词 rice blast resistance gene improvement marker-assisted selection
下载PDF
A Survey on Acute Leukemia Expression Data Classification Using Ensembles
13
作者 Abdel Nasser H.Zaied Ehab Rushdy Mona Gamal 《Computer Systems Science & Engineering》 SCIE EI 2023年第11期1349-1364,共16页
Acute leukemia is an aggressive disease that has high mortality rates worldwide.The error rate can be as high as 40%when classifying acute leukemia into its subtypes.So,there is an urgent need to support hematologists... Acute leukemia is an aggressive disease that has high mortality rates worldwide.The error rate can be as high as 40%when classifying acute leukemia into its subtypes.So,there is an urgent need to support hematologists during the classification process.More than two decades ago,researchers used microarray gene expression data to classify cancer and adopted acute leukemia as a test case.The high classification accuracy they achieved confirmed that it is possible to classify cancer subtypes using microarray gene expression data.Ensemble machine learning is an effective method that combines individual classifiers to classify new samples.Ensemble classifiers are recognized as powerful algorithms with numerous advantages over traditional classifiers.Over the past few decades,researchers have focused a great deal of attention on ensemble classifiers in a wide variety of fields,including but not limited to disease diagnosis,finance,bioinformatics,healthcare,manufacturing,and geography.This paper reviews the recent ensemble classifier approaches utilized for acute leukemia gene expression data classification.Moreover,a framework for classifying acute leukemia gene expression data is proposed.The pairwise correlation gene selection method and the Rotation Forest of Bayesian Networks are both used in this framework.Experimental outcomes show that the classification accuracy achieved by the acute leukemia ensemble classifiers constructed according to the suggested framework is good compared to the classification accuracy achieved in other studies. 展开更多
关键词 LEUKEMIA CLASSIFICATION ENSEMBLE rotation forest pairwise correlation bayesian networks gene expression data MICROARRAY gene selection
下载PDF
Comparative transcriptomes reveal the disjunction adaptive strategy of Thuja species in East Asia and North America 被引量:1
14
作者 Ermei Chang Xue Liu +3 位作者 Jiahui Chen Jingyi Sun Shaowei Yang Jianfeng Liu 《Journal of Forestry Research》 SCIE CAS CSCD 2023年第6期1963-1974,共12页
The genus Thuja is ideal for investigating the genetic basis of the East Asia-North America disjunction.The biogeographical background of the genus is debatable and an adaptive strategy is lacking.Through the analysis... The genus Thuja is ideal for investigating the genetic basis of the East Asia-North America disjunction.The biogeographical background of the genus is debatable and an adaptive strategy is lacking.Through the analysis and mining of comparative transcriptomes,species differentiation and positively selected genes(PSGs)were identified to provide information for understanding the environmental adaptation strategies of the genus Thuja.De novo assembly yielded 44,397-74,252 unigenes of the five Thuja species with contig N50length ranging from 1,559 to 1,724 bp.Annotations revealed a similar distribution of functional categories among them.Based on the phylogenetic trees constructed using the transcriptome data,T.sutchuenensis was divided first,followed by T.plicata and T.occidentalis.The final differentiation of T.koraiensis and T.standishii formed a clade.Enrichment analysis indicated that the PSGs of the North American Thuja species were involved in plant hormone signal transduction and carbon fixation of photosynthetic organisms pathways.The PSGs of East Asian Thuja were related to phenolic,alkaloid,and terpenoid synthesis,important stress-resistant genes and could increase plant resistance to external environmental stresses.This study discovered numerous aroma synthetic-related PSGs including terpene synthase(TPS)genes and lipid phosphate phosphatase 2(LPP2),associated with the synthetic aroma of T.sutchuenensis.Physiological indicators,such as the contents of soluble sugars,total chlorophyll,total phenolics,and total flavonoids were determined,which are consistent with the PSGs enrichment pathways associated with adaptive strategies in the five Thuja species.The results of this study provide an important basis for future studies on conservation genetics. 展开更多
关键词 Thuja species Comparative transcriptomes East Asia-North America disjunction Specific gene Positively selected gene
下载PDF
Distinguishing Rectal Cancer from Colon Cancer Based on the Support Vector Machine Method and RNA-sequencing Data 被引量:1
15
作者 Yan ZHANG Yuan WU +12 位作者 Zi-ying GONG Hai-dan YE Xiao kai ZHAO Jie-yi LI Xiao-mei ZHANG Sheng LI Wei ZHU Mei WANG Ge-yu LIANG Yun LIU Xin GUAN Dao-yun ZHANG Bo SHEN 《Current Medical Science》 SCIE CAS 2021年第2期368-374,共7页
Colorectal cancer (CRC) is the third most commonly diagnosed cancer worldwide.Several studies have indicated that rectal cancer is significantly different from colon cancer interms of treatment, prognosis, and metasta... Colorectal cancer (CRC) is the third most commonly diagnosed cancer worldwide.Several studies have indicated that rectal cancer is significantly different from colon cancer interms of treatment, prognosis, and metastasis. Recently, the differential mRNA expression of coloncancer and rectal cancer has received a great deal of attention. The current study aimed to identifysignificant differences between colon cancer and rectal cancer based on RNA sequencing (RNA-seq)data via support vector machines (SVM). Here, 393 CRC samples from the The Cancer GenomeAtlas (TCGA) database were investigated, including 298 patients with colon cancer and 95 withrectal cancer. Following the random forest (RF) analysis of the mRNA expression data, 96 genessuch as HOXB13, PR4C, and BCLAFI were identified and utilized to build the SVM classificationmodel with the Leave-One-Out Cross-validation (LOOCV) algorithm. In the training (n= 196)and the validation cohorts (n=197), the accuracy (82. 1 % and 82.2 %, respectively) and the AUC(0.87 and 0.91, respectively) indicated that the established optimal SVM classification modeldistinguished colon cancer from rectal cancer reasonably. However, additional experiments arerequired to validate the predicted gene expression levels and functions. 展开更多
关键词 colon cancer rectal cancer support vector machine classification gene selection
下载PDF
The allopolyploid B. juncea genome uncovered differential homoeolog gene expression influencing selection
16
《Science Foundation in China》 CAS 2016年第4期55-55,共1页
With the long-term support by the National Natural Science Foundation of China,Ministry of Agriculture,and Science and Technology Department of Zhejiang Province,the research team led by Prof.Zhang Mingfang(张明方)at ... With the long-term support by the National Natural Science Foundation of China,Ministry of Agriculture,and Science and Technology Department of Zhejiang Province,the research team led by Prof.Zhang Mingfang(张明方)at Zhejiang University,assembled an allopolyploid B.juncea genome and uncovered differential homoeolog gene expression influencing selection,which was published in Nature 展开更多
关键词 gene Zhang The allopolyploid B juncea genome uncovered differential homoeolog gene expression influencing selection
原文传递
Chloroplast DNA Underwent Independent Selection from Nuclear Genes during Soybean Domestication and Improvement
17
作者 Chao Fang Yanming Ma +5 位作者 Lichai Yuan Zheng Wang Rui Yang Zhengkui Zhou Tengfei Liu Zhixi Tian 《Journal of Genetics and Genomics》 SCIE CAS CSCD 2016年第4期217-221,共5页
The chloroplast is one of the most important organs in plants because of its essential role in photosynthesis.Studies have shown that the chloroplast was once a free-living cyanobacteria and was integrated into the ho... The chloroplast is one of the most important organs in plants because of its essential role in photosynthesis.Studies have shown that the chloroplast was once a free-living cyanobacteria and was integrated into the host species through endosymbiosis(Goksoyr.1967).after which a large number of its genes had been donated to the host nuclear genome(Heins and Soll, 1998). 展开更多
关键词 gene Chloroplast DNA Underwent Independent selection from Nuclear genes during Soybean Domestication and Improvement than DNA
原文传递
Positive Selection of CAG Repeats of the ATXN2 Gene in Chinese Ethnic Groups
18
作者 Xiao-Chen Chen Hao Sun +8 位作者 Chang-Jun Zhang Ying Zhang Ke-Qin Lin Liang Yu Lei Shi Yu-Fen Tao Xiao-Qin Huang Jia-You Chu Zhao-Qing Yang 《Journal of Genetics and Genomics》 SCIE CAS CSCD 2013年第10期543-548,共6页
The ataxin-2 (ATXN2) gene is located on human chromo-some 12q24.1. In normal individuals, the coding region in exon 1 of this gene has fewer than 31 CAG repeats (Yu et al., 2005: Laffita-Mesa et al., 2012). Howev... The ataxin-2 (ATXN2) gene is located on human chromo-some 12q24.1. In normal individuals, the coding region in exon 1 of this gene has fewer than 31 CAG repeats (Yu et al., 2005: Laffita-Mesa et al., 2012). However, an abnormal expansion of CAG trinucleotide repeats results in the aggre-gation of polyglutamine (polyQ), which causes spinocer-ebellar ataxia type 2 (SCA2) (Pulst et al., 1996). The expanded alleles have more than 32 repeats in the affected individuals, and generally there is an inverse correlation between CAG repeat length and age of onset (Pulst et al., 1996). SCA2 is an autosomal dominant inheritance neurodegenerative disease, whose major clinical feature is progressive cerebellar ataxia. Atrophies of the brainstem and frontal lobe have been frequently detected by magnetic resonance imaging (MRI) (Yamamoto-Watanabe et al., 2010). This disease has the strong effect on sensory and motor control. 展开更多
关键词 CAG gene Positive selection of CAG Repeats of the ATXN2 gene in Chinese Ethnic Groups
原文传递
Identification of differential gene expression for microarray data using recursive random forest 被引量:8
19
作者 WU Xiao-yan WU Zhen-yu LI Kang 《Chinese Medical Journal》 SCIE CAS CSCD 2008年第24期2492-2496,共5页
Background The major difficulty in the research of DNA microarray data is the large number of genes compared with the relatively small number of samples as well as the complex data structure. Random forest has receive... Background The major difficulty in the research of DNA microarray data is the large number of genes compared with the relatively small number of samples as well as the complex data structure. Random forest has received much attention recently; its primary characteristic is that it can form a classification model from the data with high dimensionality. However, optimal results can not be obtained for gene selection since it is still affected by undifferentiated genes. We proposed recursive random forest analysis and applied it to gene selection. Methods Recursive random forest, which is an improvement of random forest, obtains optimal differentiated genes after step by step dropping of genes which, according to a certain algorithm, have no effects on classification. The method has the advantage of random forest and provides a gene importance scale as well. The value of the area under the curve (AUC) of the receiver operating characteristic (ROC) curve, which synthesizes the information of sensitivity and specificity, is adopted as the key standard for evaluating the performance of this method. The focus of the paper is to validate the effectiveness of gene selection using recursive random forest through the analysis of five microarray datasets; colon, prostate, leukemia, breast and skin data. Results Five microarray datasets were analyzed and better classification results have been attained using only a few genes after gene selection. The biological information of the selected genes from breast and skin data was confirmed according to the National Center for Biotechnology Information (NCBI). The results prove that the genes associated with diseases can be effectively retained by recursive random forest. Conclusions Recursive random forest can be effectively applied to microarray data analysis and gene selection. The retained genes in the optimal model provide important information for clinical diagnoses and research of the biological mechanism of diseases. 展开更多
关键词 MICROARRAY gene selection recursive random forest
原文传递
Biomarker Identification of Rat Liver Regeneration via Adaptive Logistic Regression 被引量:2
20
作者 Liu-Yuan Chen Jie Yang +3 位作者 Guo-Guo Xu Yun-Qing Liu Jun-Tao Li Cun-Shuan Xu 《International Journal of Automation and computing》 EI CSCD 2016年第2期191-198,共8页
This paper is devoted to identifying the biomarkers of rat liver regeneration via the adaptive logistic regression. By combining the adaptive elastic net penalty with the logistic regression loss, the adaptive logisti... This paper is devoted to identifying the biomarkers of rat liver regeneration via the adaptive logistic regression. By combining the adaptive elastic net penalty with the logistic regression loss, the adaptive logistic regression is proposed to adaptively identify the important genes in groups. Furthermore, by improving the pathwise coordinate descent algorithm, a fast solving algorithm is developed for computing the regularized paths of the adaptive logistic regression. The results from the experiments performed on the microarray data of rat liver regeneration are provided to illustrate the effectiveness of the proposed method and verify the biological rationality of the selected biomarkers. 展开更多
关键词 Adaptive logistic regression gene selection microarray classification grouping effect rat liver regeneration
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部