One hundred and sixty-eight genotypes of cotton from the same growing region were used as a germplasm group to study the validity of different genetic distances in constructing cotton core subset. Mixed linear model a...One hundred and sixty-eight genotypes of cotton from the same growing region were used as a germplasm group to study the validity of different genetic distances in constructing cotton core subset. Mixed linear model approach was employed to unbiasedly predict genotypic values of 20 traits for eliminating the environmental effect. Six commonly used genetic distances(Euclidean,standardized Euclidean,Mahalanobis,city block,cosine and correlation distances) combining four commonly used hierarchical cluster methods(single distance,complete distance,unweighted pair-group average and Ward's methods) were used in the least distance stepwise sampling(LDSS) method for constructing different core subsets. The analyses of variance(ANOVA) of different evaluating parameters showed that the validities of cosine and correlation distances were inferior to those of Euclidean,standardized Euclidean,Mahalanobis and city block distances. Standardized Euclidean distance was slightly more effective than Euclidean,Mahalanobis and city block distances. The principal analysis validated standardized Euclidean distance in the course of constructing practical core subsets. The covariance matrix of accessions might be ill-conditioned when Mahalanobis distance was used to calculate genetic distance at low sampling percentages,which led to bias in small-sized core subset construction. The standardized Euclidean distance is recommended in core subset construction with LDSS method.展开更多
In the present study, a strategy was proposed for constructing plant core subsets by clusters based on the combination of continuous data for genotypic values and discrete data for molecular marker InformaUon. A mixed...In the present study, a strategy was proposed for constructing plant core subsets by clusters based on the combination of continuous data for genotypic values and discrete data for molecular marker InformaUon. A mixed linear model approach was used to predict genotyplc values for eliminating the environment effect. The "mixed genetic distance" was designed to solve the difficult problem of combining continuous and discrete data to construct a core subset by cluster. Four commonly used genetic distances for continuous data (Euclidean distance, standardized Euclidean distance, city block distance, and Mahalanobls distance) were used to assess the validity of the conUnuous data part of the mixed genetic distance; three commonly used genetic distances for discrete data (cosine distance, correlaUon distance, and Jaccard distance) were used to assess the validity of the discrete data part of the mixed genetic distance, A rice germplasm group with eight quantitative traits and information for 60 molecular markers was used to evaluate the validity of the new strategy. The results suggest that the validity of both parts of the mixed geneUc distance are equal to or higher than the common geneUc distance. The core subset constructed on the basis of a combination of data for genotyplc values and molecular marker information was more representative than that constructed on the basis of data from genotypic values or molecular marker informaUon alone. Moreover, the strategy of using combined data was able to treat dominant marker informaUon and could combine any other continuous data and discrete data together to perform cluster to construct a plant core subset.展开更多
基金Project supported by the National Natural Science Foundation of China (No. 30270759)the Cooperation Project in Science and Technology between China and Poland Governments (No. 32-38)the Scientific Research Foundation for Doctors in Shandong Academy of Agricultural Sciences (No. [2007]20), China
文摘One hundred and sixty-eight genotypes of cotton from the same growing region were used as a germplasm group to study the validity of different genetic distances in constructing cotton core subset. Mixed linear model approach was employed to unbiasedly predict genotypic values of 20 traits for eliminating the environmental effect. Six commonly used genetic distances(Euclidean,standardized Euclidean,Mahalanobis,city block,cosine and correlation distances) combining four commonly used hierarchical cluster methods(single distance,complete distance,unweighted pair-group average and Ward's methods) were used in the least distance stepwise sampling(LDSS) method for constructing different core subsets. The analyses of variance(ANOVA) of different evaluating parameters showed that the validities of cosine and correlation distances were inferior to those of Euclidean,standardized Euclidean,Mahalanobis and city block distances. Standardized Euclidean distance was slightly more effective than Euclidean,Mahalanobis and city block distances. The principal analysis validated standardized Euclidean distance in the course of constructing practical core subsets. The covariance matrix of accessions might be ill-conditioned when Mahalanobis distance was used to calculate genetic distance at low sampling percentages,which led to bias in small-sized core subset construction. The standardized Euclidean distance is recommended in core subset construction with LDSS method.
基金Supported by the National Natural Science Foundation of China (30270759).
文摘In the present study, a strategy was proposed for constructing plant core subsets by clusters based on the combination of continuous data for genotypic values and discrete data for molecular marker InformaUon. A mixed linear model approach was used to predict genotyplc values for eliminating the environment effect. The "mixed genetic distance" was designed to solve the difficult problem of combining continuous and discrete data to construct a core subset by cluster. Four commonly used genetic distances for continuous data (Euclidean distance, standardized Euclidean distance, city block distance, and Mahalanobls distance) were used to assess the validity of the conUnuous data part of the mixed genetic distance; three commonly used genetic distances for discrete data (cosine distance, correlaUon distance, and Jaccard distance) were used to assess the validity of the discrete data part of the mixed genetic distance, A rice germplasm group with eight quantitative traits and information for 60 molecular markers was used to evaluate the validity of the new strategy. The results suggest that the validity of both parts of the mixed geneUc distance are equal to or higher than the common geneUc distance. The core subset constructed on the basis of a combination of data for genotyplc values and molecular marker information was more representative than that constructed on the basis of data from genotypic values or molecular marker informaUon alone. Moreover, the strategy of using combined data was able to treat dominant marker informaUon and could combine any other continuous data and discrete data together to perform cluster to construct a plant core subset.
基金National Natural Science Foundation of China(No.30270759)the Natural Science Foundation of Shandong Province,China(No.ZR2010CQ016)the Scientific Research Foundation for PhDs in Shandong Academy of Agricultural Sciences(No. 2007YBS005)