摘要
随着DNA微阵列技术的广泛应用,产生了海量基因表达数据,如何利用这些数据研究基因间的调控关系成为当前生物信息学的一个研究热点。关联规则挖掘是数据挖掘领域的一个重要技术,然而直接对基因表达数据进行关联规则挖掘存在两个问题:一是时间和空间复杂度过高;二是获得的规则仅定性表示基因间的调控关系,无法提供关于调控关系强度的信息。本文利用聚类实现数据降维,然后将基因表达水平离散化为七个状态,最后关联分析每个聚类中的基因表达数据。实验结果表明本文的分析方法是有效的。
With general application of DNA microarray technology, huge gene expression data are produced. Identifying the regulation relationship among genes from gene expression data is an important research topic in the bioinformaties field. Association rule mining is such a popular technology in data mining field. Yet when directly applying such method to gene expression data,there exist two problems,one is the excessive cost of the time and space,and the other is that the rules obtained only show relationships among genes roughly and can' t give any clues about the strength of such relationships. In this paper,an analyzing method is presented. Firstly Clustering is used to reduce the dimension of data, then discretizing the gene expression data using seven intervals that can represent each gene with different magnitudes of expression change, at last mining association rules from the gene expression data. The experiment results show that such analyzing method is effective.
出处
《北京生物医学工程》
2008年第4期371-375,共5页
Beijing Biomedical Engineering
基金
安徽省自然科学基金(050420204)
安徽省高校学科拔尖人才基金(05025102)
安徽省高校青年教师基金(2006jql038)资助
关键词
生物信息学
基因表达数据
数据挖掘
聚类
关联规则
bioinformatics
gene expression data
data mining
clustering
association rule