摘要
针对原始标记传播算法迭代次数过多和阈值选取的不确定性等问题,提出一种改进的标记传播算法,并将其应用于基因表达谱数据分析。首先将高维基因表达谱数据表示为权值矩阵,同时定义一个表示样本类别属性的标记序列,并将其中少量样本标记为已知;然后利用根据Gauss-Seidel迭代算法推导出的迭代公式更新标记序列,并证明标记序列的解的收敛性;最后采用正负标记的方式,根据标记序列各分量的符号差异实现数据类别的划分。通过白血病和结肠癌数据集实验,证明了本文方法的有效性。
To tackle problems such as excessive iterative times and indeterminate thresholds of original label propagation algorithm, an improved label propagation method was presented with the application in the analysis of gene expression profile data. First, a weighted matrix was constructed with gene expression profile data. Meanwhile, the label sequence indicating the class information was defined, where several samples were marked as labeled data. Then, the label sequence was updated by an iterative formula which inspired from Gauss-Seidel iteration and the solution of the label sequence was proved to be converged. Finally, the clustering problem was solved using plus-minus label which was on the basis of the signs of the label sequence. Experiments on the leukemia and colon cancer data show that the proposed method is feasible and effective.
出处
《中南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2014年第7期2237-2243,共7页
Journal of Central South University:Science and Technology
基金
国家自然科学基金资助项目(61172127)
安徽省自然科学基金资助项目(1208085MF93
1208085QF104)
安徽大学"211工程"学术创新团队基金资助项目(KJTD007A)
关键词
半监督学习
权值矩阵
标记传播
基因表达谱数据
semi-supervised learning
weighted matrix
label propagation
gene expression profile data