期刊文献+

基于符号熵的序列相似性度量方法 被引量:6

Sequence Similarity Measurement Method Based on Symbol Entropy
下载PDF
导出
摘要 现有序列相似性度量算法在子序列相似性度量中仅考虑其局部相似度,忽略了其所属序列的整体结构信息。为此,提出一种以单个符号的熵为基础的序列相似性度量方法,根据同一序列中相同符号的位置及个数信息得出符号熵。通过凝聚型层次聚类结果验证序列相似性度量方法,在多个领域的符号序列数据集上的实验结果表明,与现有的基于子序列局部相似性方法相比,该相似性度量方法有效提高了聚类结果质量。 Existing sequence similarity measurement algorithms only consider the local similarity of subsequences, ignoring global structure information. Thus,a similarity measurement method based on the entropy of single symbol for sequences is proposed. The entropy of a symbol is computed according to the positions and numbers of all the same symbols in a sequence. Through verifying the validity of the new sequence similarity measurement method by agglomerative hierarchical clustering, experimental results on a plurality of datasets show that, compared with the existing methods based on local similarity of substring, the new similarity measurement method can improve the clustering accuracy significantly.
出处 《计算机工程》 CAS CSCD 北大核心 2016年第5期201-206,212,共7页 Computer Engineering
基金 国家自然科学面上基金资助项目"面向软件行为鉴别的事件序列挖掘方法研究"(61175123) 福建师范大学创新团队基金资助项目(IRTL1207)
关键词 符号序列 相似度 层次聚类 序列聚类 symbol sequence similarity entropy hierarchical clustering sequence clustering
  • 相关文献

参考文献18

  • 1Xiong T,Wang S,Jiang Q,et al.A New Markov Model for Clustering Categorical Sequences[C]//Proceedings of International Conference on Data Mining.Washington D.C.,USA:IEEE Press,2011:854-863.
  • 2Dong Guozhu,Pei Jian.Sequence Data Mining[M].New York,USA:Springer-Verlag New York Inc.,2007.
  • 3陈黎飞,郭躬德.属性加权的类属型数据非模聚类[J].软件学报,2013,24(11):2628-2641. 被引量:7
  • 4ALPAYDIN E.机器学习导论[M].北京:机械工业出版社,2009:245-251.
  • 5Kelil A,Wang S,Brzezinski R,et al.CLUSS:Clustering of Protein Sequences Based on a New Similarity Measure[J].BMC Bioinformatics,2007,8(1):286.
  • 6孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量:1076
  • 7Ron D,Singer Y,Tishby N.The Power of Amnesia:Learning Probabilistic Automata with Variable Memory Length[J].Machine Learning,1996,25(2-3):117-149.
  • 8Grossi R,Vitter J.Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching[C]//Proceedings of ACM STOC’00.New York,USA:ACM Press,2000:397-406.
  • 9Gusfield D.Algorithms on Strings,Trees,and Sequences[J].ACM SIGACT News,1997,28(4):41-60.
  • 10Ukkonen E.On-line Construction of Suffix Trees[J].Algorithmica,1995,14(3):249-260.

二级参考文献4

共引文献1083

同被引文献71

引证文献6

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部