期刊文献+

基于向量空间模型的基因序列聚类及仿真实验

Cluster analysis and emulation of gene sequences based vector space model
下载PDF
导出
摘要 聚类算法广泛应用于生物信息学数据分析中,是基因序列和表达数据分析研究的主要技术之一。提出了一种基于向量空间模型的基因序列聚类分析算法。首先利用DNA序列的结构特征,将多个DNA序列构成序列集。结合向量空间模型算法,计算DNA序列集中两两序列之间的相似度矩阵,并选取适当的阈值对相似度矩阵作截集处理,从而得到最终的聚类结果。基于DNA序列数据的仿真实验结果表明,该算法在基因序列的分析中是实用、有效的,并且具有算法简明、语义准确、向量维数可控等优点。 Clustering algorithms, which is one of the main techniques for analyzing gene sequences and expression data, are widely applied in the research of bioinformatics data. A clustering algorithm for gene sequences analysis based on vector space model is proposed in this paper. Firstly, according to the structure characteristics of gene sequences, different bases in DNA are used to construct the DNA sequences which consist of the DNA sequences set. Then the similarity matrix between DNA sequences is computed using the vector space model algorithm. The final cluster results are obtained by choosing the proper threshold for the similarity matrix over cuts. Simulation results on the DNA sequences data have shown the vector space model algorithm is veryfeasible, efficient in gene sequences analysis. The presented algorithm has the advantages of conciseness, semantic accuracy and the controllable dimension of the vector.
作者 张东生 季超
出处 《微计算机信息》 2010年第16期155-157,共3页 Control & Automation
关键词 基因序列 向量空间模型 聚类分析 gene sequences vector space model cluster analysis
  • 相关文献

参考文献6

  • 1岳峰,孙亮,王宽全,王永吉,左旺孟.基因表达数据的聚类分析研究进展[J].自动化学报,2008,34(2):113-120. 被引量:25
  • 2G Salton, A Wong, C S Yang. A vector space model for automatic indexing[J]. Communication of the ACM,1975; 18(11): 613-620.
  • 3Cheeseman P, Stutz J. Bayesian Classification(AutoClass): Theory and Results [C] // Proc. of Advances in Knowledge Discovery and Data Mining. Menlo Park, CA, USA: American Association for Artificial Intelligence, 1996:153-180.
  • 4景会成,张庆凌,马翠红.基于聚类分析的脱苯塔温度智能控制[J].微计算机信息,2008,24(31):37-39. 被引量:2
  • 5许中能.生物信息学[M].清华大学出版社.2008.9.
  • 6Kim D W, Lee K H, Lee D. Detecting clusters of different geometrical shapes in microarray gene expression data Bioinformatics, 2005, 21(9): 1927-1934.

二级参考文献63

  • 1刘滨,秦冰清,蒋祖华.一种新的聚类分析距离算法[J].成组技术与生产现代化,2004,21(2):45-49. 被引量:2
  • 2马翠红,景会成,李晓峰,纪玉荣.基于DCS的加热炉温度专家控制的实现[J].微计算机信息,2005,21(4):72-73. 被引量:3
  • 3Doraiswami R, Jiang J. Performance Monitoring in Expert Control System Automatica, 1989, 25(6).
  • 4[1]Brown P O,Botstein D.Exploring the new world of the genome with DNA microarrays.Nature Genetics,1999,21(1):33-37
  • 5[2]Jain A K,Murty M N,Flynn P J.Data clustering:a review.ACM Computing Surveys,1999,31(3):264-323
  • 6[3]Schena M,Shalon D,Davis R W,Brown P O.Quantitative monitoring of gene expression patterns with a complementary DNA microarray.Science,1999,270(5235):467-470
  • 7[4]Schena M,Scalon D,Heller R.Parallel human genome analysis:microarray-based expression monitoring of 1000 genes.Proceedings of the National Academy of Sciences of the United States of America,1996,93(20):10614-10619
  • 8[5]Ramsay G.DNA chips:state-of-the art.Nature Biotechnology,1998,16(1):40-44
  • 9[6]Lockhart D J,Dong H,Byrne M C,Follettie M T,Gallo M V,Chee M S.Expression monitoring by hybridization to high-density oligonucleotide arrays.Nature Biotechnology,1996,14(13):1675-1680
  • 10[7]Lipshutz R J,Fodor S P,Gingeras T R,Lockhart D J.High density synthetic oligonucleotide arrays.Nature Genetics,1999,21(1):20-24

共引文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部