期刊文献+

符号序列间的LZ复杂性距离及其应用 被引量:1

LZ Complexity Distance of Symbol Sequences and its Application
下载PDF
导出
摘要 在符号序列LZ复杂性的计算原理上,提出了序列间条件LZ复杂性的概念.基于条件LZ复杂性,定义了一个非空序列间的LZ复杂性距离并证明了该距离满足距离测度的4个基本性质.将LZ复杂性距离应用于计算语言学和生物信息学的研究领域,选取20种自然语言文本和29种有胎盘哺乳动物的全线粒体基因组,将它们视为不同符号集上的符号序列,分别计算两类符号序列的LZ复杂性距离矩阵.基于LZ复杂性距离矩阵,重构了20种语言的语言关系树和29种哺乳动物的系统进化树.其结果符合它们真实的演化关系,说明了LZ复杂性距离定量刻画符号序列间差异的有效性. According to the principle of LZ complexity measure of symbol sequence, the concept of conditional LZ complexity between two sequences was proposed. An LZ complexity distance metric between two non-null sequences was defined by utilizing conditional LZ complexity, the proposed distance was proofed to satisfy the four basic rules of a distance metric as well. Applications of LZ complexity distance in research fields of computational linguistics and bioinformatics were introduced then. Regarded as symbol sequences over different alphabets, texts of 20 natural languages and complete mitochondrial genomes of 29 Eutherian animals were applied to calculate two LZ complexity distance matrices respectively. The evolutional tree of these 20 languages and the phylogenetic tree of these 29 animals were reconstructed based on the two LZ complexity distance matrices. Results showed high correspondence to their real evolution as well as the validity of LZ complexity distance in quantitatively detecting dissimilarities between symbol sequences.
出处 《小型微型计算机系统》 CSCD 北大核心 2007年第5期849-854,共6页 Journal of Chinese Computer Systems
基金 国家自然科学基金(60371046)资助
关键词 序列复杂性 条件LZ复杂性 LZ复杂性距离 计算语言学 生物信息学 sequence complexity conditional LZ complexity LZ complexity distance computational linguistics bioinformatics
  • 相关文献

参考文献16

  • 1Kolmogorov A N.Three approaches to the quantitative definition of information[J].Problem of Information Transmission,1965,1(1):1-7.
  • 2Kulkarni A B,Bush S F,Evans S C.Detecting distributed denial of service attacks using Kolmogorov complexity metrics[R].2001CRD176,Niskayuna US:GE Global Research Center,2001.
  • 3卜东波,许洪波,白硕.基于描述复杂性的优化学习算法[J].计算机学报,2002,25(8):878-882. 被引量:4
  • 4Benedetto D,Caglioti E,Loreto V.Language trees and zipping[J].Physical Review Letters,2002,88(4):0487021-0487024.
  • 5Gabrielian A,Bolshoy A.Sequence complexity and DNA curvature[J].Computers & Chemistry,1999,23(3):263-274.
  • 6Bennett C H,Gacs P,Li M,et al.Information distance[J].IEEE Transactions on Information Theory,1998,44(4):1407-1423.
  • 7Cilibrasi R,Vitanyi P.Clustering by compression[J].IEEE Transactions on Information Theory,2005,51(4):1523-1545.
  • 8Lempel A,Ziv J.On the complexity of finite sequences[J].IEEE Transactions on Information Theory,1976,IT-22(1):75-81.
  • 9Kaspar F,Schuster H G.Easily calculable measure for the complexity of spatiotemporal patterns[J].Physical Review A,1987,36(2):842-848.
  • 10Masatoshi N,Sudhir K.Molecular evolution and phylogenetics[M].New York,US:Oxford University Press,2000.

二级参考文献4

  • 1周育健.“规则+例外”的学习和机器学习:硕士学位论文[M].北京:中国科学院自动化研究所,1996..
  • 2贺思敏.可满足性问题的算法设计与分析:博士学位论文[M].北京:清华大学,1997..
  • 3邵健.基于Rough Sets的信息粒度计算及其应用:硕士学位论文[M].北京:中国科学院自动化研究所,2000..
  • 4王Su.认知心理学[M].北京:北京大学出版社,1992..

共引文献3

同被引文献7

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部