摘要
长期以来 ,对于多维基因多态性数据的多元统计分析 ,如计算遗传距离时所用的聚类分析、分析群体遗传结构时所用的主成分分析、因子分析和典型相关分析等 ,一直应用为无约束条件数据而设计的经典多元线性分析方法 ,并没有注意基因多态性数据的“闭合效应”所带来的问题。从分析基因多态性数据的分布和结构特征入手 ,文中指出了基因多态性分布具有“闭合数据”的特点 ,分析了由于“闭合效应”的影响 ,经典多元线性方法用于群体遗传结构分析所面临的困难。根据成分数据统计分析的理论和方法 ,提出了基因多态性群体遗传结构的多元非线性分析基本方法。并以主成分分析为例 ,通过实例比较和分析了经典线性主成分分析和“对数比”非线性主成分分析的结果 ,证明“对数比”非线性主成分分析方法是研究基因多态性群体遗传结构的良好方法 ,具有特异、灵敏等优点 。
The distribution and structure of the allelic polymorphism data are analyzed and it is pointed out that the distribution of allelic polymorphism data reveals the characteristic of closed data (also named as compositional data or data of constant sum).It is interpreted that the correlation structure of the allelic polymorphism data contains null correlations introduced by 'closure' and the statistical distribution of the data is not normal because of its constant row sum,which resulted in great difficulties in analyzing the data with traditional multiple linear statistical methods such as principal component analysis,factor analysis,cluster analysis and canonical correlation analysis.Based on the theory of compositional data analysis proposed by Aitchison in 1982,a multiple nonlinear statistical method originating from the 'logratios' approach to the statistical analysis of compositional data is put forward in this paper.As an example,the 'logratios' method was used to analyze the genetic structure of TH01 polymorphic loci in Chinese population and the results were compared with those of multiple linear methods such as component principal.It is concluded that the 'logratios' multiple nonlinear principle component analysis is a better method with the virtue of sensitivity and specificity for analyzing the genetic structure of population from the data of allelic polymorphism.
基金
国家自然科学基金资助项目 (No .30 1 70 52 7)~~
关键词
基因多态性
群体遗传结构
多元非线性分析
allelic polymorphism
genetic structure of populations
multiple nonlinear statistical method