摘要
基因数据双聚类是基因表达数据矩阵中具有相近的表达水平的子矩阵,其中的行和列分别代表基因子集和条件子集。双聚类算法则是在基因数据矩阵的行和列2个方向上同时聚类以找出这样的子矩阵。本文提出基于模拟退火与粒子群优化的混合优化算法,避免单纯模拟退火法中的概率突跳性缺点。我们算法采用自底向上的搜索策略,首先生成双聚类种子,然后采用混合优化算法添加种子的行和列,找出最优聚类结果。在酵母细胞基因数据集的实验中,我们双聚类的各项指标能够达到高质量结构,验证了本文方法的有效性。
in gene expression data matrix, a bicluster is a grouping of a subset of genes and a subset of conditions which exhibits a high correlation of expression activity across both rows and columns. Biclustering algorithms aim at finding subsets of genes and subsets of conditions such that a single cellular process is the main contributor to the expression of the gene subset over the condition subset. The algofthm is simulated annealing and particle swarm hybrid optimization algorithm and can avoid the drawback of data's leap from simulated annealing algorithm. This algorithm is based on the bottom-up search strategy. First, we generate a set of high quality bicluster seeds. In the second phase, these bicluster seeds are enlarged by adding more genes and conditions using simulated annealing and particle swarm hybrid optimization algorithm. In the third phase, we have used the same gene expression data sets as the yeast dataset to compare our results. The experiment result indicates that the total score of our algorithm can achieve bicluster structure with higher qualities, and verify the effectiveness of our algorithm.
出处
《计算机与应用化学》
CAS
CSCD
北大核心
2013年第1期93-96,共4页
Computers and Applied Chemistry
关键词
双聚类
基因表达数据
模拟退火法
粒子群优化算法
bicluster, gene expression data, simulated annealing algorithm, particle swarm hybrid optimization algorithm