期刊文献+
共找到8篇文章
< 1 >
每页显示 20 50 100
Classifying Genomic Sequences by Sequence Feature Analysis 被引量:1
1
作者 Zhi-Hua Liu Dian Jia Xiao Sun 《Genomics, Proteomics & Bioinformatics》 SCIE CAS CSCD 2005年第4期201-205,共5页
Traditional sequence analysis depends on sequence alignment. In this study, we analyzed various functional regions of the human genome based on sequence features, including word frequency, dinucleotide relative abunda... Traditional sequence analysis depends on sequence alignment. In this study, we analyzed various functional regions of the human genome based on sequence features, including word frequency, dinucleotide relative abundance, and base-base correlation. We analyzed the human chromosome 22 and classified the upstream, exon, intron, downstream, and intergenic regions by principal component analysis and discriminant analysis of these features. The results show that we could classify the functional regions of genome based on sequence feature and discriminant analysis. 展开更多
关键词 GENOME sequence feature analysis BBC PCA discriminant analysis
原文传递
Cloning and Sequence Analysis of Y-box Binding Protein Gene in Min Pig
2
作者 Zhang Dong-jie Liu Di +2 位作者 Wang Liang He Xin-miao Wang Wen-tao 《Journal of Northeast Agricultural University(English Edition)》 CAS 2014年第1期52-55,共4页
In order to study the gene sequence of Min pig Y-box binding protein (YB-1) gene, the complete coding sequence of Min pig YB-1 gene was cloned by RT-PCR, the sequence features were analyzed by some software and onli... In order to study the gene sequence of Min pig Y-box binding protein (YB-1) gene, the complete coding sequence of Min pig YB-1 gene was cloned by RT-PCR, the sequence features were analyzed by some software and online website. The results showed that the complete CDS of Min pig Y-box was found to be 975 bp long, encoding 324 amino acids. It contained a conserved cold shock domain and several phosphorylation sites, but had no transmembrane domains, and was consistent with a protein found in the cytoplasm. Min pig YB-1 nucleotides shared high similarity (61.37%- 97.66%) with other mammals. 展开更多
关键词 Min pig Y-box binding protein sequence feature
下载PDF
Relation Classification via Sequence Features and Bi-Directional LSTMs 被引量:6
3
作者 REN Yuanfang TENG Chong +2 位作者 LI Fei CHEN Bo JI Donghong 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2017年第6期489-497,共9页
Structure features need complicated pre-processing, and are probably domain-dependent. To reduce time cost of pre-processing, we propose a novel neural network architecture which is a bi-directional long-short-term-me... Structure features need complicated pre-processing, and are probably domain-dependent. To reduce time cost of pre-processing, we propose a novel neural network architecture which is a bi-directional long-short-term-memory recurrent-neural-network(Bi-LSTM-RNN) model based on low-cost sequence features such as words and part-of-speech(POS) tags, to classify the relation of two entities. First, this model performs bi-directional recurrent computation along the tokens of sentences. Then, the sequence is divided into five parts and standard pooling functions are applied over the token representations of each part. Finally, the token representations are concatenated and fed into a softmax layer for relation classification. We evaluate our model on two standard benchmark datasets in different domains, namely Sem Eval-2010 Task 8 and Bio NLP-ST 2016 Task BB3. In Sem Eval-2010 Task 8, the performance of our model matches those of the state-of-the-art models, achieving 83.0% in F1. In Bio NLP-ST 2016 Task BB3, our model obtains F1 51.3% which is comparable with that of the best system. Moreover, we find that the context between two target entities plays an important role in relation classification and it can be a replacement of the shortest dependency path. 展开更多
关键词 Bi-LSTM-RNN relation classification sequence features structure features
原文传递
Predicting potential cancer genes by integrating network properties,sequence features and functional annotations 被引量:1
4
作者 LIU Wei XIE HongWei 《Science China(Life Sciences)》 SCIE CAS 2013年第8期751-757,共7页
The discovery of novel cancer genes is one of the main goals in cancer research.Bioinformatics methods can be used to accelerate cancer gene discovery,which may help in the understanding of cancer and the development ... The discovery of novel cancer genes is one of the main goals in cancer research.Bioinformatics methods can be used to accelerate cancer gene discovery,which may help in the understanding of cancer and the development of drug targets.In this paper,we describe a classifier to predict potential cancer genes that we have developed by integrating multiple biological evidence,including protein-protein interaction network properties,and sequence and functional features.We detected 55 features that were significantly different between cancer genes and non-cancer genes.Fourteen cancer-associated features were chosen to train the classifier.Four machine learning methods,logistic regression,support vector machines(SVMs),BayesNet and decision tree,were explored in the classifier models to distinguish cancer genes from non-cancer genes.The prediction power of the different models was evaluated by 5-fold cross-validation.The area under the receiver operating characteristic curve for logistic regression,SVM,Baysnet and J48 tree models was 0.834,0.740,0.800 and 0.782,respectively.Finally,the logistic regression classifier with multiple biological features was applied to the genes in the Entrez database,and 1976 cancer gene candidates were identified.We found that the integrated prediction model performed much better than the models based on the individual biological evidence,and the network and functional features had stronger powers than the sequence features in predicting cancer genes. 展开更多
关键词 cancer gene logistic regression network property sequence feature functional annotation
原文传递
Constrained query of order-preserving submatrix in gene expression data 被引量:2
5
作者 Tao JIANG Zhanhuai LI +3 位作者 Xuequn SHANG Bolin CHEN Weibang LI Zhilei YIN 《Frontiers of Computer Science》 SCIE EI CSCD 2016年第6期1052-1066,共15页
Order-preserving submatrix (OPSM) has become important in modelling biologically meaningful subspace cluster, capturing the general tendency of gene expressions across a subset of conditions. With the advance of mic... Order-preserving submatrix (OPSM) has become important in modelling biologically meaningful subspace cluster, capturing the general tendency of gene expressions across a subset of conditions. With the advance of microarray and analysis techniques, big volume of gene expression datasets and OPSM mining results are produced. OPSM query can efficiently retrieve relevant OPSMs from the huge amount of OPSM datasets. However, improving OPSM query relevancy remains a difficult task in real life exploratory data analysis processing. First, it is hard to capture subjective interestingness aspects, e.g., the analyst's expectation given her/his domain knowledge. Second, when these expectations can be declaratively specified, it is still challenging to use them during the computational process of OPSM queries. With the best of our knowledge, existing methods mainly fo- cus on batch OPSM mining, while few works involve OPSM query. To solve the above problems, the paper proposes two constrained OPSM query methods, which exploit userdefined constraints to search relevant results from two kinds of indices introduced. In this paper, extensive experiments are conducted on real datasets, and experiment results demonstrate that the multi-dimension index (cIndex) and enumerating sequence index (esIndex) based queries have better performance than brute force search. 展开更多
关键词 gene expression data OPSM constrained query brute-force search feature sequence cIndex
原文传递
Bagging with CTD–A Novel Signature for the Hierarchical Prediction of Secreted Protein Trafcking in Eukaryotes 被引量:1
6
作者 Geetha Govindan Achuthsankar S.Nair 《Genomics, Proteomics & Bioinformatics》 SCIE CAS CSCD 2013年第6期385-390,共6页
Protein trafficking or protein sorting in eukaryotes is a complicated process and is carried out based on the information contaified in the protein. Many methods reported prediction of the subcellular location of prot... Protein trafficking or protein sorting in eukaryotes is a complicated process and is carried out based on the information contaified in the protein. Many methods reported prediction of the subcellular location of proteins from sequence information. However, most of these prediction methods use a flat structure or parallel architecture to perform prediction. In this work, we introduce ensemble classifiers with features that are extracted directly from full length protein sequences to predict locations in the protein-sorting pathway hierarchically. Sequence driven features, sequence mapped features and sequence autocorrelation features were tested with ensemble learners and their performances were compared. When evaluated by independent data testing, ensemble based-bagging algorithms with sequence feature composition, transition and distribution (CTD) successfully classified two datasets with accuracies greater than 90%. We compared our results with similar published methods, and our method equally performed with the others at two levels in the secreted pathway. This study shows that the feature CTD extracted from protein sequences is effective in capturing biological features among compartments in secreted pathways. 展开更多
关键词 Sequence driven features Sequence mapped features AUTOCORRELATION Ensemble classifiter Protein Sorting
原文传递
Predicting protein subchloroplast locations:the 10th anniversary 被引量:1
7
作者 Jian SUN Pu-Feng DU 《Frontiers of Computer Science》 SCIE EI CSCD 2021年第2期1-11,共11页
Chloroplast is a type of subcellular organelle in green plants and algae.It is the main subcellular organelle for conducting photosynthetic process.The proteins,which localize within the chloroplast,are responsible fo... Chloroplast is a type of subcellular organelle in green plants and algae.It is the main subcellular organelle for conducting photosynthetic process.The proteins,which localize within the chloroplast,are responsible for the photosynthetic process at molecular level.The chloroplast can be further divided into several compartments.Proteins in different compartments are related to different steps in the photosynthetic process.Since the molecular function of a protein is highly correlated to the exact cellular localization,pinpointing the subchloroplast location of a chloroplast protein is an important step towards the understanding of its role in the photosynthetic process.Experimental process for determining protein subchloroplast location is always costly and time consuming.Therefore,computational approaches were developed to predict the protein subchloroplast locations from the primary sequences.Over the last decades,more than a dozen studies have tried to predict protein subchloroplast locations with machine learning methods.Various sequence features and various machine learning algorithms have been introduced in this research topic.In this review,we collected the comprehensive information of all existing studies regarding the prediction of protein subchloroplast locations.We compare these studies in the aspects of benchmarking datasets,sequence features,machine learning algorithms,predictive performances,and the implementation availability.We summarized the progress and current status in this special research topic.We also try to figure out the most possible future works in predicting protein subchloroplast locations.We hope this review not only list all existing works,but also serve the readers as a useful resource for quickly grasping the big picture of this research topic.We also hope this review work can be a starting point of future methodology studies regarding the prediction of protein subchloroplast locations. 展开更多
关键词 subchloroplast locations sequence features performance measures online services machine learning
原文传递
The clinical and genetic characteristics in children with mitochondrial disease in China 被引量:3
8
作者 Fang Fang Zhimei Liu +11 位作者 Hezhi Fang Jian Wu Danmin Shen Suzhen Sun Changhong Ding Tongli Han Yun Wu Junlan Lv Lei Yang Shufang Li Jianxin Lv Ying Shen 《Science China(Life Sciences)》 SCIE CAS CSCD 2017年第7期746-757,共12页
Mitochondrial disease was a clinically and genetically heterogeneous group of diseases, thus the diagnosis was very difficult to clinicians. Our objective was to analyze clinical and genetic characteristics of childre... Mitochondrial disease was a clinically and genetically heterogeneous group of diseases, thus the diagnosis was very difficult to clinicians. Our objective was to analyze clinical and genetic characteristics of children with mitochondrial disease in China. We tested 141 candidate patients who have been suspected of mitochondrial disorders by using targeted next-generation sequencing(NGS), and summarized the clinical and genetic data of gene confirmed cases from Neurology Department, Beijing Children's Hospital, Capital Medical University from October 2012 to January 2015. In our study, 40 cases of gene confirmed mitochondrial disease including eight kinds of mitochondrial disease, among which Leigh syndrome was identified to be the most common type, followed by mitochondrial encephalomyopathy, lactic acidosis, and stroke-like episodes(MELAS). The age-of-onset varies among mitochondrial disease, but early onset was common. All of 40 cases were gene confirmed, among which 25 cases(62.5%)with mitochondrial DNA(mtDNA) mutation, and 15 cases(37.5%) with nuclear DNA(nDNA) mutation. M.3243A>G(n=7)accounts for a large proportion of mtDNA mutation. The nDNA mutations include SURF1(n=7),PDHA1(n=2),and NDUFV1,NDUFAF6, SUCLA2, SUCLG1, RRM2 B, and C12orf65, respectively. 展开更多
关键词 mitochondrial disease targeted next generation sequencing clinical features gene
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部