The amino acid composition and the biased auto-correlation function are considered as features, BP neural network algorithm is used to synthesize these features. The prediction accuracy of this method is verified by u...The amino acid composition and the biased auto-correlation function are considered as features, BP neural network algorithm is used to synthesize these features. The prediction accuracy of this method is verified by using the independent non-homologous protein database. It is shown that the average absolute errors for resubstitution test are 0.070 and 0.068 with the standard deviations 0.049 and 0.047 for the prediction of the content of α-helix and β-sheet respectively. For cross-validation test, the average absolute errors are 0.075 and 0.070 with the standard deviations 0.050 and 0.049 for the prediction of the content of α-helix and β-sheet respectively. Compared with the other methods currently available, the BP neural network method combined with the amino acid composition and the biased auto-correlation function features can effectively improve the prediction accuracy.展开更多
The aim of this study was to obtain unusual mutations called conditional. The mutations manifest in some, not all representatives of a species. Collections of these mutations in chromosomes X, 2, and 3 of Drosophila m...The aim of this study was to obtain unusual mutations called conditional. The mutations manifest in some, not all representatives of a species. Collections of these mutations in chromosomes X, 2, and 3 of Drosophila melanogaster were established. Sex of fly or chromosomal rearrangement was the conditions providing "manifestation-non manifestation" of these mutations. The mutations differ from the usual by a set of properties. The salient differences in addition to conditional manifestation include: manifestation dependence on the spatial arrangement of chromosomal material in the genome, parental effects (maternal or paternal) of the mutant, capacity for transferring the genome from stable to unstable state. It is suggested that conditional mutations are mutant variants of Drosophila regulatory genes contained by the large Genomic Regulatory Network of Drosophila. Thus, the genes of this category can be detected by using special breeding procedures, mutations of these genes have unusual manifestation.展开更多
AIM TO detect significant clusters of co-expressed genes associated with tumorigenesis that might help to predict stomach adenocarcinoma (SA) prognosis.METHODS The Cancer Genome Atlas database was used to obtain RNA...AIM TO detect significant clusters of co-expressed genes associated with tumorigenesis that might help to predict stomach adenocarcinoma (SA) prognosis.METHODS The Cancer Genome Atlas database was used to obtain RNA sequences as well as complete clinical data of SA and adjacent normal tissues from patients. Weighted gene co-expression network analysis (WGCNA) was used to investigate the meaningful module along with hub genes. Expression of hub genes was analyzed in 362 paraffin-embedded SA biopsy tissues by immunohistochemical staining. Patients were classified into two groups (according to expression of hub genes): Weak expression and over-expression groups. Correlation of biomarkers with clinicopathological factors indicated patient survival.RESULTS Whole genome expression level screening identified 6,231 differentially expressed genes. Twenty-four co- expressed gene modules were identified using WGCNA. Pearson's correlation analysis showed that the tan module was the most relevant to tumor stage (r = 0.24, P = 7 × 10 -6). In addition, we detected sorting nexin (SNX)10 as the hub gene of the tan module. SNX10 expression was linked to T category (P = 0.042, x2= 8.708), N category (P = 0.000, x2= 18.778), TNM stage (P = 0.001, x2 = 16.744) as well as tumor differentiation (P = 0.000,x2= 251.930). Patients with high SNX10 expression tended to have longer diseasefree survival (DFS; 44.97 mo vs 33.85 mo, P = 0.000) as well as overall survival (OS; 49.95 vs 40.84 mo, P = 0.000) in univariate analysis. Multivariate analysis showed that dismal prognosis could be precisely predicted clinicopathologically using SNX10 [DFS: P = 0.014, hazard ratio (HR) = 0.698, 95% confidence interval (CI): 0.524-0.930, OS: P = 0.017, HR = 0.704, 95%CI: 0.528-0.940].CONCLUSION This study provides a new technique for screening prognostic biomarkers of SA. Weak expression of SNX10 is linked to poor prognosis, and is a suitable prognostic biomarker of SA.展开更多
Most classic network entity sorting algorithms are implemented in a homogeneous network, and they are not appli- cable to a heterogeneous network. Registered patent history data denotes the innovations and the achieve...Most classic network entity sorting algorithms are implemented in a homogeneous network, and they are not appli- cable to a heterogeneous network. Registered patent history data denotes the innovations and the achievements in different research fields. In this paper, we present an iteration algorithm called inventor-ranking, to sort the influences of patent inventors in heterogeneous networks constructed based on their patent data. This approach is a flexible rule-based method, making full use of the features of network topology. We sort the inventors and patents by a set of rules, and the algorithm iterates continuously until it meets a certain convergence condition. We also give a detailed analysis of influential inventor's interesting topics using a latent Dirichlet allocation (LDA) model. Compared with the traditional methods such as PageRank, our approach takes full advantage of the information in the heterogeneous network, including the relationship between inventors and the relationship between the inventor and the patent. Experimental results show that our method can effectively identify the inventors with high influence in patent data, and that it converges faster than PageRank.展开更多
A network of 3719 tRNA gene sequences was constructed using simplest alignment. Its topology, degree distribution and clustering coefficient were studied. The behaviors of the network shift from fluctuated distributio...A network of 3719 tRNA gene sequences was constructed using simplest alignment. Its topology, degree distribution and clustering coefficient were studied. The behaviors of the network shift from fluctuated distribution to scale-free distribution when the similarity degree of the tRNA gene sequences increases. The tRNA gene sequences with the same anticodon identity are more self-organized than those with different anticodon identities and form local clusters in the network. Some vertices of the local cluster have a high connection with other local clusters, and the probable reason was given. Moreover, a network constructed by the same number of random tRNA sequences was used to make comparisons. The relationships between the properties of the tRNA similarity network and the characters of tRNA evolutionary history were discussed.展开更多
The discovery of novel cancer genes is one of the main goals in cancer research.Bioinformatics methods can be used to accelerate cancer gene discovery,which may help in the understanding of cancer and the development ...The discovery of novel cancer genes is one of the main goals in cancer research.Bioinformatics methods can be used to accelerate cancer gene discovery,which may help in the understanding of cancer and the development of drug targets.In this paper,we describe a classifier to predict potential cancer genes that we have developed by integrating multiple biological evidence,including protein-protein interaction network properties,and sequence and functional features.We detected 55 features that were significantly different between cancer genes and non-cancer genes.Fourteen cancer-associated features were chosen to train the classifier.Four machine learning methods,logistic regression,support vector machines(SVMs),BayesNet and decision tree,were explored in the classifier models to distinguish cancer genes from non-cancer genes.The prediction power of the different models was evaluated by 5-fold cross-validation.The area under the receiver operating characteristic curve for logistic regression,SVM,Baysnet and J48 tree models was 0.834,0.740,0.800 and 0.782,respectively.Finally,the logistic regression classifier with multiple biological features was applied to the genes in the Entrez database,and 1976 cancer gene candidates were identified.We found that the integrated prediction model performed much better than the models based on the individual biological evidence,and the network and functional features had stronger powers than the sequence features in predicting cancer genes.展开更多
Evolutionary studies have been of prime importance to life scientists since ancient times. The advancements in technology has made it possible to make available the massive amounts of genomic data. The abundance of ge...Evolutionary studies have been of prime importance to life scientists since ancient times. The advancements in technology has made it possible to make available the massive amounts of genomic data. The abundance of genomic data poses new challenges for biologists, computer scientists and mathematicians to develop approaches for discovery of new relationships in data and evolutionary networks. In this work, nucleotide sequences are converted into binary sequences to explore the network among different species. A new approach based on binary sequences has been proposed to reconstruct the accurate phylogenetic network. The algorithm developed is validated by comparing the results with those obtained by already existing method of network construction. A program is also coded in C language on the Intel Core i3 Dell inspiron machine to obtain the evolutionary network. The new approach developed also provides the fast solutions as there is no need of aligning the sequences.展开更多
文摘The amino acid composition and the biased auto-correlation function are considered as features, BP neural network algorithm is used to synthesize these features. The prediction accuracy of this method is verified by using the independent non-homologous protein database. It is shown that the average absolute errors for resubstitution test are 0.070 and 0.068 with the standard deviations 0.049 and 0.047 for the prediction of the content of α-helix and β-sheet respectively. For cross-validation test, the average absolute errors are 0.075 and 0.070 with the standard deviations 0.050 and 0.049 for the prediction of the content of α-helix and β-sheet respectively. Compared with the other methods currently available, the BP neural network method combined with the amino acid composition and the biased auto-correlation function features can effectively improve the prediction accuracy.
文摘The aim of this study was to obtain unusual mutations called conditional. The mutations manifest in some, not all representatives of a species. Collections of these mutations in chromosomes X, 2, and 3 of Drosophila melanogaster were established. Sex of fly or chromosomal rearrangement was the conditions providing "manifestation-non manifestation" of these mutations. The mutations differ from the usual by a set of properties. The salient differences in addition to conditional manifestation include: manifestation dependence on the spatial arrangement of chromosomal material in the genome, parental effects (maternal or paternal) of the mutant, capacity for transferring the genome from stable to unstable state. It is suggested that conditional mutations are mutant variants of Drosophila regulatory genes contained by the large Genomic Regulatory Network of Drosophila. Thus, the genes of this category can be detected by using special breeding procedures, mutations of these genes have unusual manifestation.
文摘AIM TO detect significant clusters of co-expressed genes associated with tumorigenesis that might help to predict stomach adenocarcinoma (SA) prognosis.METHODS The Cancer Genome Atlas database was used to obtain RNA sequences as well as complete clinical data of SA and adjacent normal tissues from patients. Weighted gene co-expression network analysis (WGCNA) was used to investigate the meaningful module along with hub genes. Expression of hub genes was analyzed in 362 paraffin-embedded SA biopsy tissues by immunohistochemical staining. Patients were classified into two groups (according to expression of hub genes): Weak expression and over-expression groups. Correlation of biomarkers with clinicopathological factors indicated patient survival.RESULTS Whole genome expression level screening identified 6,231 differentially expressed genes. Twenty-four co- expressed gene modules were identified using WGCNA. Pearson's correlation analysis showed that the tan module was the most relevant to tumor stage (r = 0.24, P = 7 × 10 -6). In addition, we detected sorting nexin (SNX)10 as the hub gene of the tan module. SNX10 expression was linked to T category (P = 0.042, x2= 8.708), N category (P = 0.000, x2= 18.778), TNM stage (P = 0.001, x2 = 16.744) as well as tumor differentiation (P = 0.000,x2= 251.930). Patients with high SNX10 expression tended to have longer diseasefree survival (DFS; 44.97 mo vs 33.85 mo, P = 0.000) as well as overall survival (OS; 49.95 vs 40.84 mo, P = 0.000) in univariate analysis. Multivariate analysis showed that dismal prognosis could be precisely predicted clinicopathologically using SNX10 [DFS: P = 0.014, hazard ratio (HR) = 0.698, 95% confidence interval (CI): 0.524-0.930, OS: P = 0.017, HR = 0.704, 95%CI: 0.528-0.940].CONCLUSION This study provides a new technique for screening prognostic biomarkers of SA. Weak expression of SNX10 is linked to poor prognosis, and is a suitable prognostic biomarker of SA.
基金Project supported by the National Science and Technology Support Plan (No. 2013BAH21B02-01), Beijing Natural Science Foundation (No. 4153058), and Shanghai Key Laboratory of Intelligent Information Processing (No. IIPL-2014-004)
文摘Most classic network entity sorting algorithms are implemented in a homogeneous network, and they are not appli- cable to a heterogeneous network. Registered patent history data denotes the innovations and the achievements in different research fields. In this paper, we present an iteration algorithm called inventor-ranking, to sort the influences of patent inventors in heterogeneous networks constructed based on their patent data. This approach is a flexible rule-based method, making full use of the features of network topology. We sort the inventors and patents by a set of rules, and the algorithm iterates continuously until it meets a certain convergence condition. We also give a detailed analysis of influential inventor's interesting topics using a latent Dirichlet allocation (LDA) model. Compared with the traditional methods such as PageRank, our approach takes full advantage of the information in the heterogeneous network, including the relationship between inventors and the relationship between the inventor and the patent. Experimental results show that our method can effectively identify the inventors with high influence in patent data, and that it converges faster than PageRank.
基金the National Natural Science Foundation of China (Nos. 10105007, 10334020, 90103035,10574088)
文摘A network of 3719 tRNA gene sequences was constructed using simplest alignment. Its topology, degree distribution and clustering coefficient were studied. The behaviors of the network shift from fluctuated distribution to scale-free distribution when the similarity degree of the tRNA gene sequences increases. The tRNA gene sequences with the same anticodon identity are more self-organized than those with different anticodon identities and form local clusters in the network. Some vertices of the local cluster have a high connection with other local clusters, and the probable reason was given. Moreover, a network constructed by the same number of random tRNA sequences was used to make comparisons. The relationships between the properties of the tRNA similarity network and the characters of tRNA evolutionary history were discussed.
基金supported by the National Natural Science Foundation of China (31000591,31000587,31171266)
文摘The discovery of novel cancer genes is one of the main goals in cancer research.Bioinformatics methods can be used to accelerate cancer gene discovery,which may help in the understanding of cancer and the development of drug targets.In this paper,we describe a classifier to predict potential cancer genes that we have developed by integrating multiple biological evidence,including protein-protein interaction network properties,and sequence and functional features.We detected 55 features that were significantly different between cancer genes and non-cancer genes.Fourteen cancer-associated features were chosen to train the classifier.Four machine learning methods,logistic regression,support vector machines(SVMs),BayesNet and decision tree,were explored in the classifier models to distinguish cancer genes from non-cancer genes.The prediction power of the different models was evaluated by 5-fold cross-validation.The area under the receiver operating characteristic curve for logistic regression,SVM,Baysnet and J48 tree models was 0.834,0.740,0.800 and 0.782,respectively.Finally,the logistic regression classifier with multiple biological features was applied to the genes in the Entrez database,and 1976 cancer gene candidates were identified.We found that the integrated prediction model performed much better than the models based on the individual biological evidence,and the network and functional features had stronger powers than the sequence features in predicting cancer genes.
文摘Evolutionary studies have been of prime importance to life scientists since ancient times. The advancements in technology has made it possible to make available the massive amounts of genomic data. The abundance of genomic data poses new challenges for biologists, computer scientists and mathematicians to develop approaches for discovery of new relationships in data and evolutionary networks. In this work, nucleotide sequences are converted into binary sequences to explore the network among different species. A new approach based on binary sequences has been proposed to reconstruct the accurate phylogenetic network. The algorithm developed is validated by comparing the results with those obtained by already existing method of network construction. A program is also coded in C language on the Intel Core i3 Dell inspiron machine to obtain the evolutionary network. The new approach developed also provides the fast solutions as there is no need of aligning the sequences.