摘要
研究了DNA序列的判别分类问题.通过分析20组已知类别人造DNA序列碱基(A、T、C、G)含量的统计信息,并结合遗传学知识研究了序列中碱基配对组成20种氨基酸的分布及含量等统计信息,提取A、B两类的分类特征,并进一步约化得到了碱基含量、主要氨基酸含量的统计特征;采用距离判别分类法建立了ATCG判别分类模型、主要氨基酸判别分类模型,对未知类别的DNA序列进行了判别分类;实例表明,2种判别分类模型的误判概率均为5%.
In the paper, the problem of classification of DNA array is studied. The statistical information of content of alkali-base (A?T?C?G)of the 20 categorical DNA array is analyzed. Content and distribution of 20 main aminophenol are studied. The classified feature of two groups of DNA array is obtained. Finally, the content of alkali-base and main aminophenol is selected as classificatory statistical feature. The method of distance discriminating is used. And a ATCG classificatory model and main aminophenol classificatory model are built to classify the other DNA array of unknown nature. The instances indicate that the probabilities of mis-classification of both model are 5%.
出处
《装备指挥技术学院学报》
2004年第4期101-104,共4页
Journal of the Academy of Equipment Command & Technology