一种基于似然极大的动态聚类方法及其应用被引量：2

A Maximum Likelihood-Based Dynamic Clustering Method and Its Application

下载PDF

导出

摘要将传统的动态聚类分析和判别分析相结合,引出一种基于似然极大的动态聚类方法,该方法以EM算法实现的极大似然估计进行类参数估计,以相应的贝叶斯后验概率判别个体的归类。模拟研究表明,该方法通常既可无偏估计类参数,又可判别最佳分类个数。与重心法动态聚类和最小组内平方和法动态聚类相比,稳健性较高。同时通过提高判别标准,可以降低误判率。用Fisher的Iris试验数据验证了方法的可行性,并将之成功应用于一个水稻F2群体的个体的主基因基因型鉴别。 Clustering analysis is to determine the intrinsic grouping in a set of unlabeled data. A cluster is a collection of objects which are similar between them and are dissimilar to the objects belonging to other clusters. However, the current clustering techniques have not addressed all the requirements adequately. For instance, dealing with large number of dimensions and large number of data can be problematic because of time complexity. The effectiveness of the distance-based clustering methods depends on the definition of distance ; if an obvious distance measure doesn＇ t exist we must define it, which is not always easy, especially in multi-dimensional spaces. In addition, the choice of the optimal number of clusters in practice is impossible. Thus, choosing the correct number of clusters and the best clustering method is still a question open to discussion, in order to solve these problems, in this paper, we introduced a maximum likelihood-based dynamic clustering method, which combined the conventional dynamic clustering and discrimination analysis. The parameters of different clusters were estimated by the maximum likelihood method implemented via expectation-maximization （EM） algorithm and the objects were classified by the Bayesian posterior probability. This classified idea could increase the posterior confidence of classified individuals. The results of simulation studies showed that the proposed method not only unbiasedly estimated the corresponding cluster parameters but also differentiated the optimum clustering numbers by Bayesian information criterion （BIC）. Compared with the K-means method and the minimum square sum within groups （MinSSw） method, the proposed method was more robustness and had almost the same clustering accuracy as K-means and MinSSw methods. Moreover, the miselassified rate （MR） could be reduced by enhancing the discrimination criterion. However, the unclassified rate （UR） would be increased by enhancing the discrimination criterion. Thus, an eclectic discrimination criterion could be given by the user in order to decrease both MR and UR. The method was validated by a real dataset and the result indicated that the proposed method had a significant advantage on clustering accuracy compared to the K-means and MinSSw methods. An example of the plant height and the number of tiller of F2 population in rice cross Duonieai × Zhonghua 11 was used in the illustration. The results indicated that the genetic difference of these two traits in this cross involves only one pleiotropic major gene. The additive effect and dominance effect of the major gene were estimated as - 24.57 em and 57.12 cm on plant height, and 23.01 and - 25.89 on number of tiller, respectively. The major gene shows overdominance for plant height and near eomplete dominance for number of tillers.

作者肖静胡治球王学枫徐辰武

机构地区扬州大学江苏省作物遗传生理重点实验室

出处《作物学报》 CAS CSCD 北大核心 2007年第1期70-76,共7页 Acta Agronomica Sinica

基金国家自然科学基金项目(30270724 30370758) 教育部新世纪优秀人才支持计划项目

关键词聚类分析后验概率贝叶斯信息准则判别分析 Cluster analysis Posterior probability Bayesian information criterion Discrimination analysis

分类号 S11 [农业科学—农业基础科学]

引文网络
相关文献

参考文献27

1Wylie M P, Holtizman J. The non-line of sight problem in mobile location estimation.In: Pine IEEE ICUPC, Cambridge, MA, 1996, Vol 2. pp 827-831.
2Zhang Y-T(张尧庭), Fang K-T(方开泰).Introduction to Multivariate Statistical Analysis(多元统计分析引论). Beijing: Science Press, 1983. pp 401-457 .
3Johnoson R A, Wichern D W. Applied Multivariate Statistical Analysis. New Jersey: Prentice-Hall, Inc, 1982. pp 532-560.
4Wu W L, Xiong H, Shekhar S. Clustering and Information Retrieval. Norwell, Mass. Kluwer Academic Publishers, 2004.
5Leszczynski J. Computational Materials Science. Amsterdam, Boston: Elsevier, 2004.
6Lee M-L T. Analysis of Microarray Gene Expression Data. Boston: Kluwer Academic Publishers, 2004.
7Banks D L. Classification, clustering, and data mining applications. In: Proceedings of the Meeting of the International Federation of Classification Societies (IFCS) . Chicago: minois Institute of Technology, 2004. pp 15-18.
8Quackenbush J. Computational analysis of microarray data. Nat Rev Genet, 2001, 2:418-427.
9Speed T. Statistical Analysis of Gene Expression Microarray Data. London/Boca Raton: Chapman & Hall/CRC Press, 2003.
10MacQueen J B. Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium, 1967, 1:431-441.

二级参考文献41

1张泽,鲁成,李发德,向仲怀.家蚕产卵量的主基因探测[J].遗传,1997,19(S1):81-82. 被引量：2
2莫惠栋.质量-数量性状的遗传分析 Ⅰ.遗传组成和主基因基因型鉴别[J].作物学报,1993,19(1):1-6. 被引量：78
3莫惠栋,徐辰武.质量-数量性状的遗传分析 Ⅲ.受三倍体遗传控制的胚乳性状[J].作物学报,1994,20(5):513-519. 被引量：24
4姜长鉴,莫惠栋.质量—数量性状的遗传分析 Ⅳ.极大似然法的应用[J].作物学报,1995,21(6):641-648. 被引量：41
5Alon U,Barkai N,Notterman DA,et al.Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.Proc Natl Acad Sci USA,1999,96(12): 6745-6750.
6Herrero J,Valencia A,Dopazo J.A hierarchical unsupervised growing neural network for clustering gene expression patterns.Bioinformatics,2001,17(2): 126-136.
7Baldi P,Long AD.A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes.Bioinformatics,2001,17(6): 509-519.
8Yeung KY,Fraley C,Murua A,et al.Model-based clustering and data transformations for gene expression data.Bioinformatics,2001,17(10): 977-987.
9Lukashin AV,Fuchs R.Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters.Bioinformatics,2001,17(5): 405-414.
10Ben-Dor A,Shamir R,Yakhini Z.Clustering gene expression patterns.J Comput Biol,1999,6(3-4): 281-297.

共引文献37

1刘婷,郭海湘,诸克军,高思维.一种改进的遗传k-means聚类算法[J].数学的实践与认识,2007,37(8):104-111. 被引量：22
2万尤宝,褚君浩,于天燕,余丙鲲.铁电晶体铌酸钾锂的二次谐波产生[J].人工晶体学报,2000,29(S1).
3曹树志,项响琴.基于改进的K_Means算法的城市高架桥交通流分析[J].公路交通科技（应用技术版）,2010,6(10):261-264.
4孙国强,卫志农,周封伟.改进迭代自组织数据分析法的不良数据辨识[J].中国电机工程学报,2006,26(11):162-166. 被引量：33
5刘林,喻国平.基于自组织特征映射(SOM)网络对潜在客户的挖掘[J].南昌大学学报（理科版）,2006,30(5):507-510. 被引量：2
6董桂春,王余龙,张岳芳,陈培峰,杨连新,黄建晔.影响常规籼稻品种氮素籽粒生产效率的主要源库指标[J].作物学报,2007,33(1):43-49. 被引量：22
7曹晖,席斌,米红.一种新聚类算法在基因表达数据分析中的应用[J].计算机工程与应用,2007,43(18):234-238. 被引量：5
8冯珺,孙济庆.一种基于知网的K-means聚类算法[J].情报学报,2007,26(3):356-360. 被引量：1
9杜永,王艳,王学红,孙乃立,杨建昌.黄淮地区不同粳稻品种株型、产量与品质的比较分析[J].作物学报,2007,33(7):1079-1085. 被引量：61
10张亚妮,张恩平,吴迪,陈玉林.KAP基因的多态性与辽宁绒山羊经济性状的关系研究[J].中国农业科学,2007,40(9):2062-2067. 被引量：27

同被引文献40

1王威,候本伟,田杰,苏经宇.城市供水管网抗震安全性模糊-随机模拟评价[J].土木工程学报,2013,46(S2):278-281. 被引量：18
2张尧庭方开泰.多元统计分析引论[M].北京:科学出版社,1983.488.
3顾世梁.实现动态聚类全局最优的一种算法[J].江苏农学院学报,1996,17(1):57-65.
4高惠璇.应用多元统计分析[M].北京:北京大学出版社,2002.
5Wylie M P, Holtizman J. The non-line of sight problem in mobile location estimation//Proc. Fifth IEEE International Conference Universal Personal Communications (ICUPC) , Cambridge, MA, 1996, 2: 827-831.
6Johnoson R A, Wichern D W. Applied Multivariate Statistical Analysis. New Jersey: Prentice-Hall, Inc, 1982: 532-560.
7Wang S C, Li X L, Tang H Y. Hybrid data clustering based on dependency structure and gibbs sampling. Lecture Notes in Computer Science, 2006, 4304:1145-1151.
8Quackenbush J. Computational analysis of microarray data. Nature Reviews Genetics, 2001, 2:418-427.
9Speed T. Statistical Analysis of Gene Expression Microarray Data. London/Boca Raton: Chapman and Hall/CRC Press, 2003.
10MacQueen J B. Some methods for classification and analysis of multivariate observations. In." Proceedings of the 5th Berkeley Symposium, 1967, 1: 431-441.

引证文献2

1肖静,骆如九,宋雯,汤在祥,徐辰武.带有缺失数据的一种动态聚类方法[J].中国农业科学,2012,45(21):4534-4542.
2贾睿,杜坤,宋志刚.基于等效场景的供水管网抗震可靠性评估[J].地震工程与工程振动,2024,44(2):30-37.

1冉延平,余昭平,贾利新,康学福.基于混合模型的聚类算法研究[J].河南科学,2005,23(3):324-327. 被引量：3
2李勤丰,郭海凤,刘玉霞.基于统计特征的数据分类[J].金陵科技学院学报,2015,31(4):53-56. 被引量：1
3XIONG Cui,ZHANG Jun,LUO Xinchao.Ridge-Forward Quadratic Discriminant Analysis in High-Dimensional Situations[J].Journal of Systems Science & Complexity,2016,29(6):1703-1715.
4董莹,宋立新,华志强.组合惩罚似然估计下发散参数变量选择[J].大连理工大学学报,2015,55(4):436-441.
5黄蛟龙,曹致琦,马海燕,张泽.极大似然法探测主基因的效能[J].作物学报,2003,29(1):133-137. 被引量：6
6李占利,张群会,张家彬.一种扩展的动态聚类分析方法[J].数理统计与管理,1994,13(5):50-52. 被引量：5
7曾玉.信息检索的模糊聚类分析模型[J].情报学报,2004,23(4):433-436. 被引量：15
8李勤丰,郭海凤,刘玉霞.基于混合粒子群算法的数据分类[J].金陵科技学院学报,2015,31(2):14-17. 被引量：1
9黄力明,吴小俊,王士同.基于二阶段的模糊C-均值算法的模糊聚类分析研究[J].华东船舶工业学院学报,2000,14(2):74-77.
10唐晓清,秦勇飞,陆勇斌.基于动态贝叶斯网络的公路交通事件算法[J].邵阳学院学报（自然科学版）,2009,6(2):8-10. 被引量：1

作物学报

2007年第1期

浏览历史

内容加载中请稍等...

一种基于似然极大的动态聚类方法及其应用被引量：2

参考文献27

二级参考文献41

共引文献37

同被引文献40

引证文献2

相关作者

相关机构

相关主题

浏览历史

一种基于似然极大的动态聚类方法及其应用 被引量：2

参考文献27

二级参考文献41

共引文献37

同被引文献40

引证文献2

相关作者

相关机构

相关主题

浏览历史

一种基于似然极大的动态聚类方法及其应用被引量：2