摘要
提出了一种客观的特征提取和相关的方法用于DNA序列的结构分析.这种方法是从DNA序列码的碱基和片段码中提取统计特征和相关特征.然后计算样本序列和已知类之间的平均相关系数.如果最大的相关系数大于对应类的平均相关系数,则该样本被分类到对应的类中去.利用一组DNA序列样本做了试验,结果表明,这种方法适合于任何DNA序列的结构分析而不需要先念的生物信息,对发掘人类基因隐藏信息的研究大有用处。
Propose an unbiased method of feature extraction and classification for DNA sequence analysis. In the method, statistical and correlation features are extracted from raw DNA sequence data and the mean correlation features of a sample DNA sequence to all given classes are calculat- ed. If the maximal mean correlation feature exceeds the mean correlation feature of an existing class, the sample is grouped into the corresponding class. Otherwise, it is group into a new class- Using a set of sample DNA sequences, we demonstrate that the method is suitable for analysis of any DNA sequence data without a priori knowledge of functional information. Such approach should be useful in discovering conserved sequence elements in the human genome.
出处
《四川大学学报(自然科学版)》
CAS
CSCD
北大核心
2006年第2期334-340,共7页
Journal of Sichuan University(Natural Science Edition)