期刊文献+

对基因组内不同序列分布差异进行量化的探讨

Quantification of the Differences between the Distributions of the Sequences in a Genome
下载PDF
导出
摘要 目的针对DNA词频分析中序列分布问题,探讨对基因组内不同序列的分布差异进行量化的可行性。方法该研究采用数值模拟的方法对Kolmogorov-Smirnov检验的统计量和累积概率曲线下图形的图心进行了比较。结果随着样本含量的增加,两个指标的离散趋势逐渐减小,但其集中趋势并没有受到明显影响,且不同的分布集中于不同的位置;当样本含量为100时,所能判别的最小统计量差异约为0.1,图心差异约为0.02;使用统计量指标时,需采用两个基准分布才能将5个待测分布分开,而图心指标可以直接将5个待测分布分开。结论两个指标都可以看作分布差异的量化指标,但在大多数情况下样本含量应该大于100;当需要在同一坐标系表示不同分布时,图心可能是一个较好的选择。 Objective In view of the DNA sequence distribution problem in the word frequency analysis ,to discuss the feasibility that to quantify the differences in the distributions of the sequences in genome. Methods The numerical simulation was used to compare two indexes ,the statistic of KolmogorovSmirnov test and the centroid of the figure under the cumulative probability curves. Results With the increase of the sample size, the discrete trend of two indexes was gradually reduced, but their central tendency was not affected obviously, and the different distributions concentrated at the different locations; When the sample size was equal to 100 ,the discriminating minimum difference of statistics was about 0. 1 ,the difference of centroids was about 0. 02; When statistics were adopted, two reference distributions had to be used to separate five distributions,but centroids can separate five distributions directly. Conclusion Both of two indexes can be seen as quantitative indicators of the differences of distributions,but in most cases, the sample size should be greater than 100; The centroid is possible a better choice, when some different distributions are marked in the same coordinate system.
出处 《中国卫生统计》 CSCD 北大核心 2014年第4期554-558,共5页 Chinese Journal of Health Statistics
基金 国家自然科学基金资助项目(31071156)
关键词 基因组序列 分布差异 Kolmogorov-Smirnov检验 图心 数值模拟 Sequences in genome Differences of distributions Kolmogorov-Smirnov test Centroid Numerical simulation
  • 相关文献

参考文献4

  • 1Frank J, Massey Jr. The Kolmogorov-Smimov test for goodness of fit. Journal of the American Statistical Association, 1951,46 ( 253 ) :68-78.
  • 2邓宇,彭斌,田考聪.R统计计算环境简介[J].中国卫生统计,2008,25(4):421-422. 被引量:1
  • 3R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. IS- BN 3-900051-07-0, URL http ://www. R-project. org/.
  • 4Pham-Gia T, Hung TL. The mean and median absolute deviations. Math- ematical and Computer Modelling,2001,34 (7) : 921-936.

二级参考文献5

  • 1http: / / www. szdmz, corn/study/study- 9031. html.
  • 2http://www. r - project, org/.
  • 3Emmanuel Paradis. R for beginner. France: Institut des Sciences de 1' Evolution, 2002 : 62-64.
  • 4盛聚,谢式千,潘承毅.概率论与数理统计.北京:高等教育出版社,2003,282-284.
  • 5Venables WN, Smith DM. An Introduction to R. USA, 1992, 102-108.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部