期刊文献+

利用人类计算技术的语音语料库标注方法及其实现 被引量:2

Method and implementation of transcribing speech corpora based on human-computation
下载PDF
导出
摘要 提出一种基于人类计算的语音语料库标注方法.该标注方法的主要思路是通过一个基于Web的语言学习系统来收集由大量学习者(用户)输入的词汇标注和音标标注,并从中选择出现概率最大的用户输入作为语料的正确标注.为了保证通过这种人类计算方法获得的标注文本的质量,使用了一些计算机辅助机制来校验收集到的标注的可靠性.采用这种方法实现语音语料库标注的主要优点在于将语料库标注和语言学习相结合,无需专门投入大量的人力来进行枯燥乏味的语料库标注工作,从而节省了语料库标注的成本.对这种基于人类计算的语音语料库标注技术进行了探讨,说明了用于收集用户输入的语言学习系统的设计以及标注生成系统的设计.系统的应用表明,该标注方法能够有效、低成本地生成语音语料库的词汇标注和音标标注. A new method is proposed for generating transcriptions of speech corpora based on human-computation. The method depends on collection of orthographic transcriptions and phonetic transcriptions from a large number of users by using a Web-based language learning system and choosing commonly-used labels as the transcriptions of the speech corpora. In order to guarantee the quality of transcriptions, some computer-aided mechanisms are also used to verify the collected transcriptions. This method combines speech data transcribing with language learning and cuts down the cost of transcribing corpora effectively. The technology of human-computation-based speech corpora transcribing and the detailed design of language learning system have been discussed, transcriptions generation system has also been expatiated in this article. The application of system shows that this method is an effective and economical way to generate orthographic and phonetic transcriptions.
出处 《智能系统学报》 2009年第3期270-277,共8页 CAAI Transactions on Intelligent Systems
基金 国家留学基金资助项目(2006104705) 福建省自然科学基金资助项目(2006J0043) 厦门大学"985工程"二期信息创新平台资助项目(0000-X07204)
关键词 语音语料库标注 人类计算 分布式知识获取 基于Web的语言学习 speech corpora transcription human-computation distributed knowledge acquisition Web-based language learning
  • 相关文献

参考文献19

  • 1AHN L von,DABBISH L.Labeling images with a computer game[C]//Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.Vienna,Austria,2004:319-326.
  • 2BIRD S,LIBEMAN M.A formal framework for linguistic annotation[J].Speech Communication,2001,33(1/2):23-60.
  • 3YOUNG S J,EVENMANN G,GALES M,et al.The HTK book (for HTK Version 3.4)[EB/OL].[2008-06-20].http://htk.eng.cam.ac.uk/ prot-docs/htk_book.shtml.
  • 4DEMUYNCK K,LAUREYS T,GILLIS S.Automatic generation of phonetic transcriptions for large speech corpora[C]//Proceedings of the 7th International Conference on Spoken Language Processing.Denver,USA,2002:333-336.
  • 5SCHIEL F.Automatic phonetic transcription of non-prompted speech[C]//Proceedings of 1999 International Conference of Phonetic Sciences.San Francisco,USA,1999:607-610.
  • 6CHANG S,SHASTRI L,GREENBERG S.Automatic phonetic transcription of spontaneous speech (American English)[C]//Proceedings of the 6th International Conference on Spoken Language Processing.Beijing,2000,4:330-333.
  • 7CHEN S S,EIDE E,GALES M J F,et al.Automatic transcription of broadcast news[J].Speech Communication,2002,37 (1/2):69-87.
  • 8CHAN H Y,WOODLAND P.Improving broadcast news transcription by lightly supervised discriminative training[C]//Proceedings of 2004 IEEE International Conference on Acoustics,Speech,and Signal Processing.Montreal,Canada,2004,1:737-740.
  • 9KATO K,NANJO H,KAWAHARA T.Automatic transcription of lecture speech using topic-independent language modeling[C]//Proceedings of the Sixth International Conference on Spoken Language Processing.Beijing,China,2000:162-165.
  • 10BACCHIANI M.Automatic transcription of voicemail at AT&T[C]//Proceedings of 2001 IEEE International Conference on Acoustics,Speech,and Signal Processing.Salt Lake City,USA,2001,1:25-28.

同被引文献15

引证文献2

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部