摘要
基于自由能模型预测RNA二级结构时,真实结构可能存在于高于最小自由能一定范围内的次优结构集合中.通过对RNA次优结构集合聚类,选取代表性的结构,可以提高RNA二级结构预测的准确率.针对可变密度的RNA二级结构数据集合,提出了一种可扩展半径的密度聚类算法.算法利用特征选择方法对特征集合进行筛选,选取与聚类相关度较高的特征子集,降低聚类空间的维度.聚类过程,以最大密度对象作为簇的初始聚类中心,根据簇内的密度分布情况和密度变化参数更新簇的半径,直到簇扩展完成.实验表明,该算法可以识别并处理变密度簇,能够有效地聚类RNA二级结构.
Prediction of RNA secondary structure based on free energy model produces the problem that the true structure may be a suboptimal structure w ithin an energy increment above the minimum free energy. The accuracy of the true RNA structure prediction can be improved through grouping suboptimal structures into a small number of clusters and computing a representative structure for each cluster. In this paper,a density-based clustering algorithm w ith extensible radius dubbed ' ER-DBSCAN' is presented to handle the RNA dataset having variable density. Our method firstly adopts feature selection algorithm based on the consensus matrix to filter the feature set and select the features having the high correlation w ith clustering analysis to reduce the dimension of the clustering space.Next,the clustering module ER-DBSCAN starts w ith the maximum density object as the starting point of a new cluster,and adjusts the radius of the cluster based on the density distribution and density variation during cluster expansion. Our results indicate that ER-DBSCAN can detect and handle clusters of varying density,and cluster RNA secondary structures effectively.
出处
《小型微型计算机系统》
CSCD
北大核心
2015年第9期1968-1972,共5页
Journal of Chinese Computer Systems
关键词
RNA二级结构
次优结构
密度聚类算法
特征选择
RNA secondary structure
suboptimal structure
density-based clustering algorithm
feature selection