摘要
针对具有大段连续文本标注、但无时间标签的电视剧语音提出了一种半监督自动语音分割算法。首先采用原始的标注文本构建一个有偏的语言模型,然后将该语言模型以一种半监督的方式用于电视剧语音识别中,最后利用自动语音识别的解码结果对传统的基于距离度量、模型分类以及基于音素识别的语音分割算法进行改进。在英国科幻电视剧"神秘博士"数据集合上的实验结果表明,提出的半监督自动语音分割算法能够取得明显优于传统语音分割算法的性能,不仅有效解决了电视剧语音识别中大段连续音频的自动分割问题,还能对相应的大段连续文本标注进行分段,保证分割后各语音段时间标签及其对应文本的准确性。
To deal with the speech segmentation of TV-drama which has large coherent text transcriptions but no time-stamps,an automatic semi'supervised speech segmentation algorithm is proposed in the paper.Firstly,the original text transcriptions are used to build a biased language model,then the model is applied to the TV-drama speech recognition in a semi-supervised way,and finally,the resulting automatic speech decoding hypothesis are well combined with the traditional segmentation methods to improve the performances of speech segmentation.These traditional methods are usually based on the distance metric,model classification and the phone recognizers.Experimental results on the British TV-drama“Doctor Who”database demonstrate that,the proposed approach can achieve significant performance improvement over traditional baseline algorithms.Meanwhile,the proposed approach allows high quality segmentation and the associated transcription alignments for the large coherent TV-drama speech recordings.
作者
龙艳花
茅红伟
叶宏
Long Yanhua;Mao Hongwei;Ye Hong(The College of Information,Mechanical and Electrical Engineering,Shanghai Normal University,Shanghai,200234,China)
出处
《数据采集与处理》
CSCD
北大核心
2019年第2期281-287,共7页
Journal of Data Acquisition and Processing
基金
上海市青年科技英才扬帆计划(14YF1409300)资助项目
国家自然科学基金(61701306)资助项目
关键词
语音识别
半监督
语音标注
speech recognition
semi-supervised
speech transcription