摘要
Zipf 在语言学的途径被利用分析频率的统计特征和 16 最近附近的核苷酸的关联( AA ,交流, AG ,..., TT )在 12 个人的染色体( Y , 22 , 21 , 20 , 19 , 18 , 17 , 16 , 15 , 14 , 13 ,和 12 )。它被发现那在人的染色体的最近附近的核苷酸的这些统计特征:(i) 频率分发是线性功能,并且(i i ) 关联分发是反的功能。线性功能和反的功能的系数取决于 GC 内容。它第一次建议最近附近的核苷酸的关联分发并且关于最近附近的核苷酸扩大描述符。
Zipf's approach in linguistics is utilized to analyze the statistical features of frequency and correlation of 16 nearest neighboring nucleotides (AA, AC, AG, …, TT) in 12 human chro- mosomes (Y, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, and 12). It is found that these statistical features of nearest neighboring nucleotides in human genome: (i) the frequency distribution is a linear function, and (ii) the correlation distribution is an inverse function. The coefficients of the linear function and inverse function depend on the GC content. It proposes the correlation distribution of nearest neighboring nucleotides for the first time and extends the descriptor about nearest neighboring nueleotides.
基金
ACKNOWLEDGMENTS This work was supported by the National Natural Science Foundation of China (No.20173023 and No.90203012) and the Specialized Research Fund for the Doctoral Program of Higher Education of China
关键词
频数分布
齐普夫
邻核苷酸
染色体
人类基因组
Zipf's law, Nearest neighboring nucleotide, Frequency distribution, Correlation distribution