期刊文献+

基于粒度空间的最小生成树分类算法 被引量:1

Minimum spanning tree classification algorithm based on the granular space
下载PDF
导出
摘要 基于粒度空间理论,进行了基于归一化距离的最小生成树分类算法研究.首先根据类内偏差和类间偏差的性质,在已有的粒度空间生成算法的基础上,引入最小生成树以及新的最优聚类指标,给出了基于归一化距离的最小生成树分类算法,并建立了最优聚类模型.其次,将模型应用于研究从NCBI上下载的1902-2015年间的898条现在已经确认能够感染人的禽流感病毒蛋白质序列HA与NA蛋白,共有8种,包括H5N1,H5N2,H7N2,H7N3,H7N7,H9N2,H10N7,以及最近的H7N9.在距离中心最近的基础上,通过运行最小生成树分类算法,6个代表病毒序列被选出,并且得到了最优层次结构.最后,对实验结果进行分析,结果表明病毒爆发地域差异、病毒爆发时间等因素对禽流感病毒的变异产生了重要影响,这些结果与已有的研究结果一致,说明本文提出的最小生成树分类算法是有效的.在寻找基于粒度空间的最佳聚类问题上,最小生成树分类算法比原有的算法具有更低的复杂度.这些结论为基于大数据的信息处理提供了一种全新的处理方法. According to the granular space theory,minimum spanning tree classification algorithm is proposed based on normalized metric.Firstly,based on the existing representation and generation algorithm of granular space,by introducing the minimum spanning tree and the new optimization clustering index based on the intra-class deviation and inter-class deviation,an optimal model was established.Furthermore,the 8 subgroups(H5N1,H5N2,H7N2,H7N3,H7N7,H9N2,H10N7 and H7N9)of 898 avian influenza viruses containing both HA and NA protein were used as an experimental database.These avian influenza viruses occurred from 1902 to 2015 around the world and could infect people.Based on the characteristics of avian influenza virus data sets,the 898 avian influenza viruses were divided into two classes by running the algorithm first time.Each class contains varying amounts of the close rela-tionship between viral sequences,respectively,842 and 56.Considering the complexity of the evolutionary tree structure,a signature virus representative is selected for each class of optimal clustering for more effective research and discussion of new methods.In order to further study the nature of avian influenza virus,the two types of influenza viruses were analyzed separately by the algorithm again.Based on the nearest principle,6 representative viruses were selected and a phylogenetic tree was constructed.Finally,comparing the results with those in the literature,we found that the variation of human influenza virus is closely related to the region and the outbreak time.These results are consistent with the results of previous studies,indicating that the algorithm is effective.The minimum spanning tree classification algorithm has lower complexity than the original algorithm in finding the optimization clustering.These conclusions provide a new approach to information processing based on large data.
作者 孙梦梦 唐旭清 Sun Mengmeng Tang Xuqing(School of Science, Jiangnan University, Wuxi, 214122, China Wuxi Engineering Research Center for Biocomputing,Jiangnan University,Wuxi, 214122, China)
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2017年第5期963-971,共9页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金(11371174) 国际科技合作研究项目(2011DFR70500)
关键词 粒度空间 类内偏差 类间偏差 最小生成树 最优聚类 granular space intra-class deviation inter-class deviation minimum spanning tree optimal clustering
  • 相关文献

参考文献10

二级参考文献141

共引文献382

同被引文献35

引证文献1

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部