摘要
针对共调控基因的特殊性质和现有共调控基因聚类算法存在的不足,提出了基于广义相似性的聚类模型g-Cluster.正负共调控基因因具有相同的编码而被聚集到同一个共调控基因簇中.进一步提出了一种基于树结构的聚类算法FBTD,采用先宽度优先后深度优先的搜索策略,挖掘所有符合条件的最大g-Cluster,同时应用了高效的削减规则和优化策略.将该算法用于真实数据集.理论分析和实验结果都表明,该算法是实用和有效的.
A novel clustering model, i.e., the g-Cluster, is developed on the basis of generalized similarity for the special properties and disadvantages of existing clustering algorithms of co- regulated genes. The positive and negative co-regulated genes in this model are integrated into the same cluster if and only if they are provided with the same code. Further, a tree-based clustering algorithm FBTD(first breadth then depth) is proposed, where the priorities in search strategy is that the breadth is taken first then the depth, to find out all the maximal g-Clusters with high- efficiency pruning rules and optimizing strategy performed simultaneously. Applying the FBTD algorithm to real datasets involving genes, both the theoretic and testing results showed that the algorithm is practically efficient.
出处
《东北大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2009年第11期1558-1561,共4页
Journal of Northeastern University(Natural Science)
基金
国家自然科学基金资助项目(60803026
60873011
60773219)
教育部博士学科点新教师基金资助项目(20070145112)
教育部重大培育项目(706016)
国家重点基础研究发展计划项目(2007AA01Z192)
关键词
共调控基因
聚类
模式相似性
基因本体
co-regulated genes
clustering
pattern similarity
gene ontology