How to handle large voice data is an important problem in speech recognition applications.It uses parallel computing to replace the traditional standalone process,if the parallel scheduling control is not good,the final result will be an error and if data segmentation is unreasonable,the data will lose semantic consistency leading to decline accuracy.Pieces of the file on the network transmission costs also need to consider.To solve above problems,it proposes a speech recognition system based on Hadoop,uses HDFS and MapReduce to solve pieces of the file transfer and control parallel scheduling and uses silence detection to handle file split.Through the experiment,it proves the effectiveness of this system.
Computer Engineering and Applications