期刊文献+

始发保优的序列比对 被引量:1

Sequence Alignment Based on From-scratch Optimum Preservation
下载PDF
导出
摘要 动态时间规整(Dynamic Time Warping,DTW)是序列比对的经典方法,可以计算动态对应的序列间距的最小值.该文从一个新颖的角度重构了DTW的理论框架,提出了DTW的可加保优和始发保优两条特性.并依赖始发保优的算法拓展能力,将DTW由普通距离延拓到平均距离、Pearson相关系数、和Tanimoto相似系数的动态优化的计算模式,建立了一系列序列比对的新方法.动态计算的Pearson相关系数和Tanimoto相似系数较常规Pearson系数和Tanimoto系数分别更能捕捉长度一致的序列之间的真实相似性.对该系列方法以计算效率最高的动态平均距离为代表作内置,进行大量标注数据集的层次聚类比较测试,证实相较于传统DTW算法在基于序列比对的聚类准确率(平均F值)上至少有35个百分点的提高.且使用kNN作序列匹配分类实验也证实了动态平均距离优于传统DTW距离. DTW(Dynamic Time Warping)is a classical method for sequence data alignment,which searches the optimal value of distance among all possible paths.The searching algorithm depends on state transition equations solved by dynamic programming principle.The theory model of DTW is constructed at another novel point of view,and proposing two properties of optimum-preserving:additive optimum-preserving and from-scratch optimum-preserving.The property additive optimum preservation is discovered through accumulated-addition law of square Euclidean distance or Manhattan distance.And it shows that any each segment of the optimal path matched by DTW is one optimal sub-path started from its endpoint to its other endpoint.The property derives to an equivalent fact that any each sub-path originated from the started endpoint of the optimal path and terminated at any point of that path is also optimal.The truth is named as the property from-scratch optimum preservation.The latter property discloses the nature structure of this type of combination optimization problem,and it deduces the traditional searching algorithm which is originally drawn from state transition equations.The other form of distance or similarity is considered,like average distance,Pearson correlation coefficients,Tanimoto similarity coefficients,as replacement of distance from the combination optimization problem in essence related with DTW.Obviously,average distance,Pearson coefficients,and Tanimoto coefficients do not satisfy the accumulated addition law,which makes the global optimal path of the solution to that updated combination optimization problem lose the property additive optimum preservation and the property from-scratch optimum preservation.In consideration of the property from-scratch optimum preservation having good computability,the deformed combination optimization problem is transformed into the form with the restrictive condition that it satisfies from-scratch optimum preservation.Then a series of algorithms are designed based on average distance,Pearson correlation and Tanimoto similarity.A demonstration is given that the path of restrictive solution does not possess additive optimum preservation.And the average distance built-in algorithm is tested by dataset clustering verification with performance superior to traditional DTW method,improving 35 points above in correct classification rate(F-value).And the use of kNN for sequence matching classification experiments also confirms that the dynamic mean distance is superior to the traditional DTW distance.
作者 陆成刚 LU Cheng-gang(Institute of Science,Zhejiang University of Technology,Hangzhou 310023,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2020年第5期932-939,共8页 Journal of Chinese Computer Systems
基金 浙江省基础公益研究计划项目(2019-LCHG)资助.
关键词 DTW 可加保优 始发保优 平均距离 Pearson系数 Tanimoto系数 Dynamic Time Warping additive optimum-preserving from-scratch optimum-preserving average distance Pearson correlation coefficients Tanimoto similarity coefficients
  • 相关文献

同被引文献1

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部