摘要
为实现语音转换,建立符合要求的汉语语音转换语料库,提出一种基于半音节模型的语料自动选取算法。根据语音转换训练时需要语料数量较少的特征,选择半音节作为语料库的基本单位。在此基础上,从原始语料中自动选取语料,根据语音转换对说话人特征较敏感的情况,利用评估函数并根据半音节的出现次数对原始语料中的句子进行打分。实验结果表明,与传统算法相比,该算法在语音库自动选取615句汉语语料时,可以覆盖97.8%的带声调半音节,其覆盖效率、覆盖率和稀疏度有较大改进。
In order to realize voice conversion,a satisfied voice conversion corpus needs to be built up.This paper proposes an automatic corpus selection algorithm based on the semi-syllable model.Because the number of corpus sentences is small for voice conversion,the semi-syllable is chosen as the basic unit of the corpus.The algorithm automatically selects corpus from original corpus.An evaluation function is utilized to score sentences from original corpus according to the number and the kind of semi-syllable.When the number of chosen sentences is 615,the set of selected text covers 97.8% of the semi-syllables.The covering rate,coverage efficiency and sparse rate are obviously better than that of conventional algorithms.
出处
《计算机工程》
CAS
CSCD
北大核心
2011年第5期256-257,260,共3页
Computer Engineering
基金
国家部委基金资助项目
关键词
中文信息处理
语音库
语音转换
覆盖率
Chinese information processing
speech database
voice conversion
covering rate