摘要
目的 比较主成分分析与聚类分析两种聚类方法对 13个人群进行分类的结果。方法 采用两种数值分类方法并用Y染色体的 12种单体型的双等位基因频率数据 ,对朝鲜族等 13个人群进行分类 ,分析群体间的关系 ,并阐明民族的起源。结果 两种分类方法得到的结果不尽相同。主成分分析可以减少无关指标的影响 ,但是在简化数据降低维数的过程中又有可能丢失信息。聚类分析充分利用原始数据信息 ,但无法排除无关指标的“噪音”干扰。结论 主成分分析与聚类分析都适宜做多维复杂数据的分类研究 ,但在实际应用中 ,应运用两种分类方法得到的结果结合领域知识给出客观、合理的结论。
Objective To compare the classification results by using principal component analysis and cluster analysis to classify 13 ethnic populations.Methods We used the two kinds of numerical taxonomy methods to analyze the Y chromosomal bialletic markers frequencies, so that we can explain the genetic relationship and the origin of ethnic groups.Results There is a little difference between the results of the two taxonomy methods ,principal component analysis can reduce the impact of irrelative characters ,but it might loss useful information on the course of simplifing data and reducing data dimension. Whereas, cluster analysis can take full advantage of initial data, it could not eliminate interfere from irrelative characters .Conclusion Principal component analysis and cluster analysis both are suitable for classification study to high dimension ,but we should give reasonable conclusion by combining genetic domain knowledge.
出处
《中国卫生统计》
CSCD
北大核心
2002年第4期201-203,共3页
Chinese Journal of Health Statistics
基金
本文得到国家自然科学基金 (编号 39970 397)
黑龙江省科技攻关项目资助