In this paper,we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.We first discuss models such as recurrent neural networks(RNNs) a...In this paper,we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.We first discuss models such as recurrent neural networks(RNNs) and convolutional neural networks(CNNs) that can effectively exploit variablelength contextual information,and their various combination with other models.We then describe models that are optimized end-to-end and emphasize on feature representations learned jointly with the rest of the system,the connectionist temporal classification(CTC) criterion,and the attention-based sequenceto-sequence translation model.We further illustrate robustness issues in speech recognition systems,and discuss acoustic model adaptation,speech enhancement and separation,and robust training strategies.We also cover modeling techniques that lead to more efficient decoding and discuss possible future directions in acoustic model research.展开更多
The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of au...The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition(ASR) systems.In this overview paper,we review the techniques proposed in the last two decades in attacking this problem.We focus our discussions on the speech separation problem given its central role in the cocktail party environment,and describe the conventional single-channel techniques such as computational auditory scene analysis(CASA),non-negative matrix factorization(NMF) and generative models,the conventional multi-channel techniques such as beamforming and multi-channel blind source separation,and the newly developed deep learning-based techniques,such as deep clustering(DPCL),the deep attractor network(DANet),and permutation invariant training(PIT).We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment.We argue effectively exploiting information in the microphone array,the acoustic training set,and the language itself using a more powerful model.Better optimization ob jective and techniques will be the approach to solving the cocktail party problem.展开更多
We introduce two new types of tensors called the strictly semimonotone tensor and the range column sufficient tensor and explore their structure properties. Based on the obtained results, we make a characterization to...We introduce two new types of tensors called the strictly semimonotone tensor and the range column sufficient tensor and explore their structure properties. Based on the obtained results, we make a characterization to the solution of tensor complementarity problem.展开更多
文摘In this paper,we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.We first discuss models such as recurrent neural networks(RNNs) and convolutional neural networks(CNNs) that can effectively exploit variablelength contextual information,and their various combination with other models.We then describe models that are optimized end-to-end and emphasize on feature representations learned jointly with the rest of the system,the connectionist temporal classification(CTC) criterion,and the attention-based sequenceto-sequence translation model.We further illustrate robustness issues in speech recognition systems,and discuss acoustic model adaptation,speech enhancement and separation,and robust training strategies.We also cover modeling techniques that lead to more efficient decoding and discuss possible future directions in acoustic model research.
基金supported by the Tencent and Shanghai Jiao Tong University Joint Project
文摘The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition(ASR) systems.In this overview paper,we review the techniques proposed in the last two decades in attacking this problem.We focus our discussions on the speech separation problem given its central role in the cocktail party environment,and describe the conventional single-channel techniques such as computational auditory scene analysis(CASA),non-negative matrix factorization(NMF) and generative models,the conventional multi-channel techniques such as beamforming and multi-channel blind source separation,and the newly developed deep learning-based techniques,such as deep clustering(DPCL),the deep attractor network(DANet),and permutation invariant training(PIT).We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment.We argue effectively exploiting information in the microphone array,the acoustic training set,and the language itself using a more powerful model.Better optimization ob jective and techniques will be the approach to solving the cocktail party problem.
基金The authors wish to give their sincere thanks to the anonymous referees for their valuable suggestions and helpful comments, which help improve the presentation of the paper. This research was done during Xueyong Wang's postdoctoral period in Qufu Normal University. This work was supported by the National Natural Science Foundation of China (Grants Nos. 11671228, 11601261, 11401058), the Natural Science Foundation of Shandong Province (Grant No. ZR2016AQ12), and the Science & Technology Planning Project of Qufu Normal University (XKJ201623).
文摘We introduce two new types of tensors called the strictly semimonotone tensor and the range column sufficient tensor and explore their structure properties. Based on the obtained results, we make a characterization to the solution of tensor complementarity problem.