摘要
在很长的DNA序列中,DFT的大量运算会影响基因识别的效率,对此提出了在Voss映射下仅依赖于碱基在三种不同位置上出现频率的信噪比计算公式,推导发现在Voss与Z-curve映射下其频谱之间存在4倍缩放关系,仿真实验验证了此结果.针对阈值判别的主观性和经验性等缺陷,运用仿真实验方法确定不同类基因的阈值.从敏感性、特异性和精确度三方面对不同阈值下的测试结果进行了评估.同时提出了基于Bootstrap重复抽样的基因最优阈值算法,对不同类基因的最优阈值进行了预测,其中人和鼠类的最优阈值为1.930,并分析了算法的有效性和可行性,其精确度达到了92.8%.
In long sequences of DNA, large number of operations of DFT will affect the efficiency of gene identification.Under Voss mapping we proposed SNR equation which only depends on the bases' frequency of occurrence in three kinds of different positions. The spectrum between Voss and Z-curve mapping exists 4 times scaling relationship and simulation experiment verify the result.In order to avoid subjective and empirical defects when determine threshold,we use simulation experiment to determine the different gene thresholds.We also assess the test results from three aspects of sensitivity, specificity and accuracy.Meanwhile,we put forward the optimal threshold algorithm based on Bootstrap repeated sampling and predict the optimal threshold for different genes.Among these optimal thresholds,murine and human is 1.930.Analyzing the feasibility and validity of the algorithm, the accuracy achieved 92.8%.
出处
《数学的实践与认识》
CSCD
北大核心
2013年第14期114-122,共9页
Mathematics in Practice and Theory