摘要
如何处理海量语音数据是语音识别应用的一个重要问题,采用并行化计算取代传统的单机处理,如果并行调度控制不当,最终合并的结果在合并顺序上就会出现错误,并且数据切分不合理还会造成语义连贯性的丢失导致准确率的降低,文件片段在网络上传输的时间开销也需要考虑,针对上述问题,提出了一种基于Hadoop的语音识别系统,借助其分布式文件系统HDFS与MapReduce并行算法解决文件片段传输与并行调度控制的问题,同时引入静音检测算法合理地处理文件切分,通过实验验证了该系统的有效性。
How to handle large voice data is an important problem in speech recognition applications.It uses parallel computing to replace the traditional standalone process,if the parallel scheduling control is not good,the final result will be an error and if data segmentation is unreasonable,the data will lose semantic consistency leading to decline accuracy.Pieces of the file on the network transmission costs also need to consider.To solve above problems,it proposes a speech recognition system based on Hadoop,uses HDFS and MapReduce to solve pieces of the file transfer and control parallel scheduling and uses silence detection to handle file split.Through the experiment,it proves the effectiveness of this system.
出处
《计算机工程与应用》
CSCD
2012年第11期71-74,共4页
Computer Engineering and Applications