期刊文献+

快速HAC聚类算法的改进及应用于无监督语音分割 被引量:1

Improvement of Fast HAC Clustering Algorithm and Application to Unsupervised Speech Segmentation
下载PDF
导出
摘要 HAC是一种常用的聚类方法。本文的目的是根据语音特征中的音素与连续时间的紧密关系,改进HAC快速算法提高无监督分割语音信号到类似音素单位。该算法是基于同一段特征相似度高于跨段特征的相似度。特征的相似度是通过计算相邻特征间的欧式距离,来得到输入语音特征相邻的距离双链表,链表中的每个节点由语音相邻特征的距离和指向前后相邻节点的指针组成。该算法也是通过遍历相邻距离节点链表,查找最小距离后,对相似的相邻特征进行合并,并重复迭代至最后一个类或满足某个阀值。整个过程完全基于无监督下完成,该方法优于快速HAC算法,与快速HAC算法相比能提升65倍以上的聚类速度,节约更多的内存空间,可应用于零资源的语音分割。 HAC is a commonly used clustering method. According to the close relationship between phonemes and continuous time in speech features, the purpose of this paper is to improve the HAC fast algorithm to improve the unsupervised segmentation of speech signals to similar phoneme units. The algorithm is based on the fact that the similarity of the same segment feature is higher than that of the cross-segment feature. The similarity of features is to calculate the Euclidean distance between adjacent features to obtain the adjacent distance double-linked list of input speech features. Each node in the linked list is composed of the distance of adjacent speech features and pointers pointing to the adjacent nodes before and after. The algorithm also traverses the linked list of adjacent distance nodes, finds the minimum distance, combines similar adjacent features, and iterates to the last class or satisfies a certain threshold. The whole process is completed completely without supervision. This method is better than the fast HAC algorithm. Compared with the fast HAC algorithm, it can improve the clustering speed by more than 65 times, save more memory space, and can be applied to zero-resource speech segmentation.
作者 韦占江 梁宇
出处 《计算机科学与应用》 2020年第8期1464-1470,共7页 Computer Science and Application
关键词 无监督 音素 HAC算法 语音分割 相邻 Unsupervised Phoneme HAC Algorithm Speech Segmentation Adjacent
  • 相关文献

参考文献2

二级参考文献26

  • 1王开军,张军英,李丹,张新娜,郭涛.自适应仿射传播聚类[J].自动化学报,2007,33(12):1242-1246. 被引量:145
  • 2Shen W, White C M, Hazen H T. A comparison of query-by-example methods for spoken term detection [C]// Interspeech 2009. Brighton, United Kingdom: Es. n.l, 2009: 2143-2146.
  • 3Chelba C, Hazen T J, Saraclar M. Retrieval and browsing of spoken content [J]. IEEE Signal Processing Magazine, 2008, 25 (3):39-49.
  • 4Jansen A, Dupoux E, Goldwater S. A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition [C]// ICASSP 2013. Vancouver, Canada : Es. n. 1, 2013 : 8111-8115.
  • 5Park A S, Glass J R. Unsupervised pattern discovery in speech [J]. IEEE Transactions on Audio, Speech and Language Pro- cessing, 2008, 16(1):186- 197.
  • 6Hazen T J, Shen W, White C. Query-by-example spoken term detection using phonetic posteriorgram templates [C] // Au- tomatic Speech Recognition and Understanding 2009. Merano/ Meran, Italy: Es. n. , 2009=421-426.
  • 7Zhang Y D, (}lass J. Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams [C] // Automatic Speech Recognition and Understanding 2009. Merano/ Meran, Italy:Is. n. 1, 2009:398-403.
  • 8Wang H P, Lee T, Leung C. Unsupervised spoken term detection with acoustic segment model [Cl//Int Conf Speech Data base and Assessments. Hsinchu, China: Es. n. 1, 2011 : 106-111.
  • 9Wang H P, Leung C, Lee T, et al. An acoustic segment modeling approach to query-by-example spoken term detection [C]// ICASSP2012. Kyoto, Japan:s.n.l, 2012= 5157-5160.
  • 10Zhang Y D, Glass J. A piecewise aggregate approximation lower-bound estimate for posteriorgram based dynamic time warping[C]// Interspeech 2011. Florence, Italy:2s. n. 1, 2011: 1909- 1912.

共引文献9

同被引文献11

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部