The Z curve is a very useful method for visual-izing and analyzing DNA sequences. It is a three-dimensional space curve that constitutes a unique representation of a given DNA se-quence. It becomes more and more impor...The Z curve is a very useful method for visual-izing and analyzing DNA sequences. It is a three-dimensional space curve that constitutes a unique representation of a given DNA se-quence. It becomes more and more important to study non-coding regions in the recent years. Using Z curve method, 15 disease-related ncRNAs and some snoRNAs and miRNAs sequences are selected from the NONCODE database in this paper, which relate to Alzheimer Disease. The corresponding Z curves of the studied ncRNAs, sequences have been mapped and compared. The statistical features of the Z curves are ob-tained. These features indicate that the ncRNAs sequences playing same roles in the celluar process have almost the same Z-curves. And the base content in these sequences is almost same too.展开更多
In gene prediction, the Fisher discriminant analysis (FDA) is used to separate protein coding region (exon) from non-coding regions (intron). Usually, the positive data set and the negative data set are of the same si...In gene prediction, the Fisher discriminant analysis (FDA) is used to separate protein coding region (exon) from non-coding regions (intron). Usually, the positive data set and the negative data set are of the same size if the number of the data is big enough. But for some situations the data are not sufficient or not equal, the threshold used in FDA may have important influence on prediction results. This paper presents a study on the selection of the threshold. The eigen value of each exon/intron sequence is computed using the Z-curve method with 69 variables. The experiments results suggest that the size and the standard deviation of the data sets and the threshold are the three key elements to be taken into consideration to improve the prediction results.展开更多
The replication of DNA is a complex biological process that is essential for life.Bacterial DNA replication is initiated at genomic loci referred to as replication origins(oriCs).Integrating the Z-curve method,DnaA bo...The replication of DNA is a complex biological process that is essential for life.Bacterial DNA replication is initiated at genomic loci referred to as replication origins(oriCs).Integrating the Z-curve method,DnaA box distribution,and comparative genomic analysis,we developed a web server to predict bacterial oriCs in 2008 called Ori-Finder,which is helpful to clarify the characteristics of bacterial oriCs.The oriCs of hundreds of sequenced bacterial genomes have been annotated in the genome reports using Ori-Finder and the predicted results have been deposited in DoriC,a manually curated database of oriCs.This has facilitated large-scale data mining of functional elements in oriCs and strand-biased analysis.Here,we describe Ori-Finder 2022 with updated prediction framework,interactive visualization module,new analysis module,and user-friendly interface.More species-specific indicator genes and functional elements of oriCs are integrated into the updated framework,which has also been redesigned to predict oriCs in draft genomes.The interactive visualization module displays more genomic information related to oriCs and their functional elements.The analysis module includes regulatory protein annotation,repeat sequence discovery,homologous oriC search,and strand-biased analyses.The redesigned interface provides additional customization options for oriC prediction.Ori-Finder 2022 is freely available at http://tubic.tju.edu.cn/Ori-Finder/and https://tubic.org/Ori-Finder/.展开更多
文摘The Z curve is a very useful method for visual-izing and analyzing DNA sequences. It is a three-dimensional space curve that constitutes a unique representation of a given DNA se-quence. It becomes more and more important to study non-coding regions in the recent years. Using Z curve method, 15 disease-related ncRNAs and some snoRNAs and miRNAs sequences are selected from the NONCODE database in this paper, which relate to Alzheimer Disease. The corresponding Z curves of the studied ncRNAs, sequences have been mapped and compared. The statistical features of the Z curves are ob-tained. These features indicate that the ncRNAs sequences playing same roles in the celluar process have almost the same Z-curves. And the base content in these sequences is almost same too.
基金the National Natural Science Foundation of China under Grant Nos.60673167 90412011 (国家自然科学基金)the National Basic Research Program of China under Grant No.2005CB321801 (国家重点基础研究发展计划(973))
文摘基于网络资源的"成长性"、"自治性"和"多样性",近年来,人们提出以通用DHT(distributed Hash table)信息服务的形式实现网络资源信息的发布和查询.然而,现有的资源信息服务在通用性、易用性和自适应性等方面仍存在不足.针对虚拟计算环境iVCE(Internet-based virtual computingen vironment)的资源聚合需求,提出构建可扩展的分布式资源信息服务SDIRIS(scalable distributed resource information service).首先,提出采用自适应DHT(adaptive FissionE,简称A-FissionE)底层架构,以对上层应用透明的方式适应不同的系统规模和稳定性;其次,基于自适应DHT提出高效的多属性区间搜索算法(multiple-attribute range FissionE,简称MR-FissionE).理论分析和模拟结果表明,SDIRIS能够高效地实现资源信息的发布与查询功能.
文摘In gene prediction, the Fisher discriminant analysis (FDA) is used to separate protein coding region (exon) from non-coding regions (intron). Usually, the positive data set and the negative data set are of the same size if the number of the data is big enough. But for some situations the data are not sufficient or not equal, the threshold used in FDA may have important influence on prediction results. This paper presents a study on the selection of the threshold. The eigen value of each exon/intron sequence is computed using the Z-curve method with 69 variables. The experiments results suggest that the size and the standard deviation of the data sets and the threshold are the three key elements to be taken into consideration to improve the prediction results.
基金supported by the National Key R&D Program of China(Grant No.2018YFA0903700)the National Natural Science Foundation of China(Grant Nos.21621004 and 31571358).
文摘The replication of DNA is a complex biological process that is essential for life.Bacterial DNA replication is initiated at genomic loci referred to as replication origins(oriCs).Integrating the Z-curve method,DnaA box distribution,and comparative genomic analysis,we developed a web server to predict bacterial oriCs in 2008 called Ori-Finder,which is helpful to clarify the characteristics of bacterial oriCs.The oriCs of hundreds of sequenced bacterial genomes have been annotated in the genome reports using Ori-Finder and the predicted results have been deposited in DoriC,a manually curated database of oriCs.This has facilitated large-scale data mining of functional elements in oriCs and strand-biased analysis.Here,we describe Ori-Finder 2022 with updated prediction framework,interactive visualization module,new analysis module,and user-friendly interface.More species-specific indicator genes and functional elements of oriCs are integrated into the updated framework,which has also been redesigned to predict oriCs in draft genomes.The interactive visualization module displays more genomic information related to oriCs and their functional elements.The analysis module includes regulatory protein annotation,repeat sequence discovery,homologous oriC search,and strand-biased analyses.The redesigned interface provides additional customization options for oriC prediction.Ori-Finder 2022 is freely available at http://tubic.tju.edu.cn/Ori-Finder/and https://tubic.org/Ori-Finder/.