DENGENE:一种高精度的基于密度的适用于基因表达数据的聚类算法被引量：1

DENGENE: High Accurate Density-based Clustering Algorithm for Gene Expression Data

下载PDF

导出

摘要根据基因表达数据的特点,提出一种高精度的基于密度的聚类算法DENGENE。DENGENE通过定义一致性检测和引进峰点改进搜索方向,使得算法能够更好地处理基因表达数据。为了评价算法的性能,选取了两组广为使用的测试数据,即啤酒酵母基因表达数据集对算法来进行测试。实验结果表明,与基于模型的五种算法、CAST算法、K-均值聚类等相比,DENGENE在滤除噪声和聚类精度方面取得了显著的改善。 According to the characteristics of gene expression data, a high accurate density-based clustering algorithm called DENGENE was proposed. DENGENE achieves good clustering by defining homogeneity test and peak points. To evaluate the performance of DENGENE, two budding yeast Saccharomyces cerevisiae data sets, which are widely used as test data sets, were used to validate the effectiveness of DENGENE. The experiment results show that compared with five model-based clustering algorithms, CAST and K-means clustering, DENGENE filters noises effectively and produces more accurate clustering resuits.

作者孙亮赵芳王永吉

机构地区中国科学院软件研究所互联网软件技术实验室香港理工大学计算学系生物识别中心

出处《计算机应用研究》 CSCD 北大核心 2007年第4期58-61,共4页 Application Research of Computers

基金国家自然科学基金资助项目(60373053) 中国科学院"百人计划"基金资助项目中国科学院与英国皇家学会联合资助项目(20030389 20032006) 留学回国人员科研启动基金项目([2003]406)

关键词基因表达数据聚类分析基于密度的聚类一致性检测峰点 gene expression data cluster analysis density-based clustering homogeneity test peak point

分类号 TP18 [自动化与计算机技术—控制理论与控制工程] TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献10

1JIANG Daxin, TANG Chun, ZHANG Aidong. Cluster analysis for gene expression: a survey [ J ]. IEEE Transactions on Knowledge and Data Engineering, 2004,16( 11 ) : 1370-1386.
2EISEN M B, SPELLMAN P T, BROWN P O, et al. Cluster analysis and display of genome-wide expression patterns[J]. Proc. of National Academy Science, 1998,95(25 ) : 14863-14868.
3ESTER M, KRIEGEL H, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[ C]. Portland: AAAI Press, 1996:226-231.
4CHO R J, CAMPBELL M J, WINZELER E A, et al. A genome-wide transcriptional analysis of the mitotic cell cycle [ J]. Mol. Cell,1998,2( 1 ) :65-73.
5TAVAZOIE, HUGHES J D, CAMPBELL M J, et al. Systematic determination of genetic network architecture. [ J ]. Natural Genetics,1999,22(3 ) :281-285.
6YEUNG K Y, FRALEY C, MURUA A, et al. Model-based clustering and data transformations for gene expression data[ J]. Bioinformaties, 2001,17(10) :977-987.
7BENDOR A, SHAMIR R, YAKHINI Z. Clustering gene expression patterns [ J ]. Journal of Computational Biology, 1999,6 ( 3/4 ) :281-297.
8MEWES H W, HEUMANN K, KAPS A, et al. MIPS: a database for protein sequences and complete genomes [ J ]. Nucleic Acids Research, 2002,30( 1 ) :31-34.
9HUBERT L, ARABLE P. Comparing partitions [ J ]. Journal of Classification, 1985,2( 1 ) :193-218.
10YEUNG K Y, HAYNOR D R, RUZZO W L. Validating clustering for gene expression data[J]. Bioinformatics, 2001,17(4) :309-318.

同被引文献7

1牛琨,张舒博,陈俊亮.融合网格密度的聚类中心初始化方案[J].北京邮电大学学报,2007,30(2):6-10. 被引量：16
2曹晖,席斌,米红.一种新聚类算法在基因表达数据分析中的应用[J].计算机工程与应用,2007,43(18):234-238. 被引量：5
3Balasubramaniyan R, Hullermeier E,Weskamp N,et al.Clustering of gene expression data using a local shape-based similarity measure[J].Bioinformatics, 2005,21 : 1069-1077.
4Mitra P,Murthy C A,Pal S K.Unsupervised feature selection us- ing feature similarity[J].IEEE Trails Pattern Analysis and Machine Intelligence,2002,24(3) :301-312.
5Son Y S,Back J S.A modified correlation coefficient based similarity measure for clustering time-course gene expression data[J]. Pattern Recognition Letters,2008,29:232-242.
6Zhao Y H,Wang G,Yin Y,et al.Mining positive and negative coregulation patem from microarray data[C]//Sixth IEEE Symposium on BionInformatics and BioEngineering(BIBE'06),2006:l-8.
7Tsai Guei-Feng, Qu A.Testing the significance of cell-cycle pat- terns in time-course microarray data using nonparametric qua- dratic inference functions[J].Computational Statistics & Data Analysis, 2008,52: 1387-1398.

引证文献1

1姜永森,陆媛,杨慧中.一种模糊相似关系的基因表达数据聚类方法[J].计算机工程与应用,2011,47(8):236-238. 被引量：2

二级引证文献2

1陶华,唐旭清.蛋白质序列的聚类结构分析[J].生物信息学,2012,10(4):269-273. 被引量：4
2孙华艳,李业丽,字云飞,韩旭.协同过滤推荐算法的改进与研究[J].计算机技术与发展,2018,28(10):44-48. 被引量：9

1葛玲玲,王浩,姚宏亮.基于改进SEM算法的基因调控网络构建方法[J].计算机应用研究,2010,27(2):450-452. 被引量：3
2董蕴源,王正华,王勇献.啤酒酵母、秀丽线虫和大肠杆菌蛋白质相互作用网络的近似二分模式分析[J].上海交通大学学报,2008,42(5):701-706. 被引量：1
3鲍真真,许激扬,蒋惠源,黄子龑.通过增加氧应力及酵母细胞膜通透性的手段优化啤酒酵母菌株JX-07生产谷胱甘肽[J].药物生物技术,2009,16(1):60-63. 被引量：5
4英国推出可独立做实验的机器人[J].科技创新与品牌,2009(8):35-35.

计算机应用研究

2007年第4期

浏览历史

内容加载中请稍等...

DENGENE:一种高精度的基于密度的适用于基因表达数据的聚类算法被引量：1

参考文献10

同被引文献7

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

DENGENE:一种高精度的基于密度的适用于基因表达数据的聚类算法 被引量：1

参考文献10

同被引文献7

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

DENGENE:一种高精度的基于密度的适用于基因表达数据的聚类算法被引量：1