摘要
基因表达数据的双向聚类问题是生物信息学中的一个重要的问题,通过对基因在各种不同实验条件下的表达数据进行双向聚类,可以分析和识别同类基因所共同拥有的基因功能以及转录调控元件.本文对基因表达数据进行双向聚类的问题进行了深入的研究,提出一种并行算法.该算法根据数据集合的大小对双向聚类质量的反单调性,由最小的数据集合开始逐步添加行或列,最终找到所有满足条件的聚类.实验结果表明,该算法处理速度快,聚类质量高,性能明显优于其它同类算法.
Biclustering of the gene expressing data is an important task in bioinformatics. By clustering the gene expressing data obtained under different experimental conditions, function and regulatory elements of the gene sequence can be analyzed and recognized. After studying the problem of gene expressing data analysis, a parallel biclustering algorithm is presented. Based on the anti-monotones property of the quality of the data sets with their sizes, the algorithm starts from the data sets containing of every two rows and every two columns of the data matrix, and gets the final biclusters by gradually adding columns and rows on the data sets, Experimental results show that our algorithm has superiority our other similar algorithms in terms of processing speed and quality of clustering and efficiency.
出处
《小型微型计算机系统》
CSCD
北大核心
2009年第4期683-689,共7页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(60473012)资助
国家科技攻关项目(2003BA614A-14)资助
江苏省自然科学基金(BK2005047)资助