摘要
针对基因芯片数据高噪音、列(基因)数比行(实验条件)数多几个数量级的特殊性,为了进一步提高从基因芯片数据挖掘共调控基因的时间效率和挖掘结果的有效性,首先根据所有两两基因对之间的Pearson相关系数对原始完整数据集进行分组,然后使用列(基因)枚举方法对各组数据分别进行闭合频繁模式挖掘,并对活化和抑制共调控关系的挖掘分别进行处理。实验结果证明:算法快速有效地挖掘出了两种共调控基因。
Microarray data sets typically contain strong noise and an order of magnitude more genes than experiments.To further reduce the running time and improve the validity of co-regulated genes mined from microarray data,a new method is proposed which firstly groups all genes according to the Pearson correlation coefficient between every two genes,then uses column(gene)enumeration to mine closed frequent patterns as positive or negative co-regulated genes for each group.The experimental results show that the proposed approach can quickly and effectively mine two kinds of co-regulated genes from microarray data.
出处
《计算机工程与应用》
CSCD
北大核心
2010年第9期33-37,共5页
Computer Engineering and Applications
基金
陕西省自然科学基金(No.2007F27)~~