摘要
In order to assist the design of short interfering ribonucleic acids (siRNA), 573 non-redundant siRNAs were collected from published literatures and the relationship between siRNAs sequences and RNA interference (RNAi) effect is analyzed by a support vector machine (SVM) based algorithm relied on a basebase correlation (BBC) feature. The results show that the proposed algorithm has the highest area under curve (AUC) value (0. 73) of the receive operating characteristic (ROC) curve and the greatest r value (0. 43) of the Pearson's correlation coefficient. This indicates that the proposed algorithm is better than the published algorithms on the collected datasets and that more attention should be paid to the base-base correlation information in future siRNA design.
为了辅助siRNA的设计,从已发表文献中共收集到573个siRNA的实验数据,使用基于统计学习理论的支持向量机(SVM)方法,提取了siRNA序列的碱基对关联性(BBC)特征,然后使用十倍交叉验证方法,对siRNA沉默目标基因的效率进行了预测.结果表明,基于支持向量机,选用多项式核作为核函数的算法具有最高的AUC值(0.73,ROC曲线图)和最高的r值(0.43,Pearson相关系数分析),优于以前基于打分的算法.结果说明,在以后的siRNA的设计中应该更多关注碱基之间的关联信息.
基金
The National Natural Science Foundation of China(No60671018,60121101)