It is well known that automatic speech recognition(ASR) is a resource consuming task. It takes sufficient amount of data to train a state-of-the-art deep neural network acoustic model. As for some low-resource languag...It is well known that automatic speech recognition(ASR) is a resource consuming task. It takes sufficient amount of data to train a state-of-the-art deep neural network acoustic model. As for some low-resource languages where scripted speech is difficult to obtain, data sparsity is the main problem that limits the performance of speech recognition system. In this paper, several knowledge transfer methods are investigated to overcome the data sparsity problem with the help of high-resource languages.The first one is a pre-training and fine-tuning(PT/FT) method, in which the parameters of hidden layers are initialized with a welltrained neural network. Secondly, the progressive neural networks(Prognets) are investigated. With the help of lateral connections in the network architecture, Prognets are immune to forgetting effect and superior in knowledge transferring. Finally,bottleneck features(BNF) are extracted using cross-lingual deep neural networks and serves as an enhanced feature to improve the performance of ASR system. Experiments are conducted in a low-resource Vietnamese dataset. The results show that all three methods yield significant gains over the baseline system, and the Prognets acoustic model performs the best. Further improvements can be obtained by combining the Prognets model and bottleneck features.展开更多
In underwater acoustic applications,the conventional cyclic direction of arrival algorithm faces challenges,including a low signal-to-noise ratio and high bandwidth when compared with modulated frequencies.In response...In underwater acoustic applications,the conventional cyclic direction of arrival algorithm faces challenges,including a low signal-to-noise ratio and high bandwidth when compared with modulated frequencies.In response to these issues,this paper introduces a novel,robust,and broadband cyclic beamforming algorithm.The proposed method substitutes the conventional cyclic covariance matrix with the variance of the cyclic covariance matrix as its primary feature.Assuming that the same frequency band shares a common steering vector,the new algorithm achieves superior detection performance for targets with specific modulation frequencies while suppressing interference signals and background noise.Experimental results demonstrate a significant enhancement in the directibity index by 81%and 181%when compared with the traditional Capon beamforming algorithm and the traditional extended wideband spectral cyclic MUSIC(EWSCM)algorithm,respectively.Moreover,the proposed algorithm substantially reduces computational complexity to 1/40th of that of the EWSCM algorithm,employing frequency band statistical averaging and covariance matrix variance.展开更多
基金partially supported by the National Natural Science Foundation of China(11590770-4,U1536117)the National Key Research and Development Program of China(2016YFB0801203,2016YFB0801200)+1 种基金the Key Science and Technology Project of the Xinjiang Uygur Autonomous Region(2016A03007-1)the Pre-research Project for Equipment of General Information System(JZX2017-0994/Y306)
文摘It is well known that automatic speech recognition(ASR) is a resource consuming task. It takes sufficient amount of data to train a state-of-the-art deep neural network acoustic model. As for some low-resource languages where scripted speech is difficult to obtain, data sparsity is the main problem that limits the performance of speech recognition system. In this paper, several knowledge transfer methods are investigated to overcome the data sparsity problem with the help of high-resource languages.The first one is a pre-training and fine-tuning(PT/FT) method, in which the parameters of hidden layers are initialized with a welltrained neural network. Secondly, the progressive neural networks(Prognets) are investigated. With the help of lateral connections in the network architecture, Prognets are immune to forgetting effect and superior in knowledge transferring. Finally,bottleneck features(BNF) are extracted using cross-lingual deep neural networks and serves as an enhanced feature to improve the performance of ASR system. Experiments are conducted in a low-resource Vietnamese dataset. The results show that all three methods yield significant gains over the baseline system, and the Prognets acoustic model performs the best. Further improvements can be obtained by combining the Prognets model and bottleneck features.
基金supported by the IOA Frontier Exploration Project (No.ZYTS202001)the Youth Innovation Promotion Association CAS。
文摘In underwater acoustic applications,the conventional cyclic direction of arrival algorithm faces challenges,including a low signal-to-noise ratio and high bandwidth when compared with modulated frequencies.In response to these issues,this paper introduces a novel,robust,and broadband cyclic beamforming algorithm.The proposed method substitutes the conventional cyclic covariance matrix with the variance of the cyclic covariance matrix as its primary feature.Assuming that the same frequency band shares a common steering vector,the new algorithm achieves superior detection performance for targets with specific modulation frequencies while suppressing interference signals and background noise.Experimental results demonstrate a significant enhancement in the directibity index by 81%and 181%when compared with the traditional Capon beamforming algorithm and the traditional extended wideband spectral cyclic MUSIC(EWSCM)algorithm,respectively.Moreover,the proposed algorithm substantially reduces computational complexity to 1/40th of that of the EWSCM algorithm,employing frequency band statistical averaging and covariance matrix variance.