摘要
在人类cDNA序列确定全长基因的过程中 ,必须判断第一个ATG密码子是真正的起始密码子还是框内的非起始作用的甲硫氨酸密码子 .在总结了这二类密码子上下文共有序列特征、周期性、密码子的碱基偏向性、以及编码的氨基酸的疏水性等方面差异的基础上 ,从模式识别的角度出发 ,将这二类密码子视为由此构成的多维特征空间中的二个模式类 ,应用费歇线性判别法进行分类器的设计 ,并对分类器的错误率进行估计 ,表明在这些特征下 ,这二类密码子可以区分 .以此分类器的标准判断真正的起始密码子和非起始甲硫氨酸密码子 ,其准确率可达 75 % .
In human cDNA sequencing projecs, it is vital to know whether the first ATG codon is a real start codon or an in frame Met codon without initiation function. In this paper, the differences were analysed using codon context in consensus, periodicity, base biases, and hydrophobicities of amino acids. In view of pattern recognition, the two kinds of Met codons can be considered as two patterns in the multi dmiension feature space, which consist of the characters described above. The classfier was designed based on Fisher linear discriminant function, and error rate of the classfier was estimated. The result shows that this method is able to distinguish the initiation codon ATG from the non initiation codon ATG with an accuracy of about 75%.
出处
《复旦学报(自然科学版)》
CAS
CSCD
北大核心
2000年第6期675-679,共5页
Journal of Fudan University:Natural Science
基金
上海现代生物与新药产业发展基金资助项目!(98431912 1)