摘要
针对单一数据源预测蛋白质功能效果不佳以及蛋白质相互作用网络信息不完全等问题,提出一种多数据源融合和基于双重索引矩阵的随机游走的蛋白质功能预测(MSI-RWDIM)算法。该算法使用了蛋白质序列、基因表达和蛋白质相互作用数据预测蛋白质功能,并根据这些数据源特性构建相应的相互作用加权网络;然后融合各数据源加权网络并结合功能相关性网络构建双重索引矩阵,使用随机游走算法计算得分进而预测蛋白质功能。在酵母数据集的五折交叉验证中,MSI-RWDIM算法具有较高的准确率和较低的覆盖率,还可降低功能标签损失率。研究结果表明,MSI-RWDIM算法的总体性能优于常用的k-近邻、直推式多标签集成分类和快速同步加权方法。
The single data source cannot effectively predict the function of protein and the information of protein interaction network is incomplete. In order to solve the problem, A Multi-Source Integration and Random Walk with Doubly Indexed Matrix (MSI-RWDIM) algorithm was proposed. The proposed algorithm used protein sequence, gene expression and protein-protein interaction for the prediction of protein function. The weighting networks were constructed from the data sources with their characteristics. A network, which was fused by the weighting networks, integrated with function correlation network to construct a doubly indexed matrix. Random walk was used to calculate annotation scores and predict protein function. The cross-validation experiments on Yeast show that MSI-RWDIM can achieve higher prediction accuracy, lower coverage and lower loss rate of function labels. The research results show that the overall performance of MSI-RWDIM is much better than commonly used k-nearest neighbor, transduetive multi-label ensemble classifier and fast simultaneous weighting method.
出处
《计算机应用》
CSCD
北大核心
2015年第6期1637-1642,共6页
journal of Computer Applications
基金
国家自然科学基金资助项目(61472061)
关键词
多数据源融合
随机游走
双重索引矩阵
功能相关性网络
蛋白质功能预测
multiple data integration
random walk
doubly indexed matrix
function correlation network
protein function prediction