期刊文献+

符号序列的概率向量聚类方法

Clustering method for symbolic sequences using probability vectors
下载PDF
导出
摘要 针对符号序列聚类中表示模型及序列间距离度量定义的困难问题,提出一种基于概率向量的表示模型及基于该模型的符号序列聚类算法。该模型引入符号序列的概率分布表示法,定义了一种基于概率分布差异的符号序列距离度量及该模型的目标函数,最后给出了一种符号序列K-均值型聚类算法,并在来自不同领域的实际应用序列集上进行了实验验证。实验结果表明,与基于子序列表示模型的符号序列聚类算法相比,所提方法在DNA序列和语音序列等具有较多符号的实际数据上,在有效提高聚类精度的同时降低聚类时间50%以上。 This paper proposed a representation model using probability vectors of symbolic sequences and a new clustering algorithm based on the model,to address the difficult problems in defining an efficient representation as well as a meaningful distance measure for symbolic sequences clustering. It proposed a probability-distribution-based representation method for symbolic sequences,on which first defined a new distance measure computed on the dissimilarity of the probability distributions,and also defined a clustering criterion for sequences clustering with the probability vector space model. Finally,it described a Kmeans-type algorithm for symbolic sequences clustering,and conducted a series of experiments on real-world sequence sets from various domains to evaluate its performance. The experimental results show that,on both gene sequences and speech sequences consisting of a relatively large number of symbols,the proposed method improves the clustering accuracy effectively with more than 50% decrease in the clustering time,compared with the existing algorithms using a subsequence-based representation model.
作者 程铃钫 陈黎飞 Cheng Lingfang;Chen Lifei(Jinshan College of Fujian Agriculture & Forestry University,Fuzhou 350002,China;School of Mathematics & Computer Science,Fujian Normal University,Fuzhou 350117,China)
出处 《计算机应用研究》 CSCD 北大核心 2018年第6期1676-1680,共5页 Application Research of Computers
基金 国家自然科学基金资助项目(61672157)
关键词 数据聚类 符号序列 向量空间模型 概率向量 马尔可夫模型 data clustering symbolic sequence vector space model probability vector Markov model
  • 相关文献

参考文献4

二级参考文献52

  • 1李刚成,刘赞波,曾庆光.一种基于模糊聚类的构造进化树方法[J].计算机应用,2009,29(3):836-838. 被引量:6
  • 2Hsu Tsuen-Ho, Chu Kao-Ming, Chan Hei-Chun. The Fuzzy Clustering on Market Segment[C]//Proc. of the 9th International Conference on Fuzzy Systems. San Antonio, TX, USA: [s. n.], 2005 621-626.
  • 3Hruschka H. Comparing Performance of Feed Forward Neural Nets and K-means for Cluster-based Market Segmentation[J]. European Journal of Operational Research, 2004, 114(2): 346-353.
  • 4Kuo R J. Integration of Self-organizing Feature Map and K-means Algorithm for Market Segmentation[J]. Computers & Operations Research, 2002, 29(11): 1475-1493.
  • 5Duda R O, Hart P E. Pattern Classification[M]. 2nd ed. New York: John Wiley & Son Inc., 2003: 12-16.
  • 6Zhou Zhun, Yang Bing- ru, Hou Wei. Association classification algorithm based on structure sequence in protein secondary structure prediction[ J]. Expert Systems with Applications, 2010,37 (9) : 6381 - 6389.
  • 7Sadowski MI, Jones DT. The sequence - structure relationship and protein function prediction [ J]. Current Opinion in Structural Biology, 2009, 19(3) : 357 -362.
  • 8Hwang I T, Lim H K, Song H Y. Cloning and characterization of a xylanase, KRICT PX1 from the strain Paenibacillus sp. HPL-001 [J]. Biotechnology Advances, 2010,28(5): 594 - 601.
  • 9Joshi C, Khare S K. Utilization of deoiled Jatropha curcas seed cake for production of xylanase from thermophilic Scytalidium ther- mophilum [ J]. Bio - resource Technology, 2011,102 ( 2 ) : 1722 - 1726.
  • 10Cai Wei - ling, Chen Song - can, Zhang Dao - qiang. A simulta- neous learning framework for clustering and classification [ J ]. Pattern Recognition, 2009, 42 (7) : 1248 - 1259.

共引文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部