最近,基于自注意力的Transformer结构在不同领域的一系列任务上表现出非常好的性能。探索了基于Transformer编码器和LAS(listen,attend and spell)解码器的Transformer-LAS语音识别模型的效果,并针对Transformer不善于捕捉局部信息的问...最近,基于自注意力的Transformer结构在不同领域的一系列任务上表现出非常好的性能。探索了基于Transformer编码器和LAS(listen,attend and spell)解码器的Transformer-LAS语音识别模型的效果,并针对Transformer不善于捕捉局部信息的问题,使用Conformer代替Transformer,提出Conformer-LAS模型。由于Attention过于灵活的对齐方式,使得在嘈杂环境中的效果急剧下降,采用连接时序分类(connectionist temporal classification,CTC)辅助训练以加快收敛,并加入音素级别的中间CTC损失联合优化,提出了效果更好的Conformer-LAS-CTC语音识别模型。在开源中文普通话Aishell-1数据集上对提出来的模型进行验证,实验结果表明,Conformer-LAS-CTC相对于采用的基线BLSTM-LAS和Transformer-LAS模型在测试集上的字错率分别相对降低了22.58%和48.76%,模型最终字错误率为4.54%。展开更多
针对使用Conformer模型的语音识别算法在实际应用时设备算力不足及资源缺乏的问题,提出一种基于Conformer模型间隔剪枝和参数量化相结合的模型压缩方法。实验显示,使用该方法压缩后,模型的实时率(real time factor, RTF)达到0.107614,...针对使用Conformer模型的语音识别算法在实际应用时设备算力不足及资源缺乏的问题,提出一种基于Conformer模型间隔剪枝和参数量化相结合的模型压缩方法。实验显示,使用该方法压缩后,模型的实时率(real time factor, RTF)达到0.107614,较基线模型的推理速度提升了16.2%,而识别准确率只下降了1.79%,并且模型大小也由原来的207.91MB下降到72.69MB。该方法在模型准确率损失很小的情况下,较大程度地提升了模型的适用性。展开更多
This study aims to address the deviation in downstream tasks caused by inaccurate recognition results when applying Automatic Speech Recognition(ASR)technology in the Air Traffic Control(ATC)field.This paper presents ...This study aims to address the deviation in downstream tasks caused by inaccurate recognition results when applying Automatic Speech Recognition(ASR)technology in the Air Traffic Control(ATC)field.This paper presents a novel cascaded model architecture,namely Conformer-CTC/Attention-T5(CCAT),to build a highly accurate and robust ATC speech recognition model.To tackle the challenges posed by noise and fast speech rate in ATC,the Conformer model is employed to extract robust and discriminative speech representations from raw waveforms.On the decoding side,the Attention mechanism is integrated to facilitate precise alignment between input features and output characters.The Text-To-Text Transfer Transformer(T5)language model is also introduced to handle particular pronunciations and code-mixing issues,providing more accurate and concise textual output for downstream tasks.To enhance the model’s robustness,transfer learning and data augmentation techniques are utilized in the training strategy.The model’s performance is optimized by performing hyperparameter tunings,such as adjusting the number of attention heads,encoder layers,and the weights of the loss function.The experimental results demonstrate the significant contributions of data augmentation,hyperparameter tuning,and error correction models to the overall model performance.On the Our ATC Corpus dataset,the proposed model achieves a Character Error Rate(CER)of 3.44%,representing a 3.64%improvement compared to the baseline model.Moreover,the effectiveness of the proposed model is validated on two publicly available datasets.On the AISHELL-1 dataset,the CCAT model achieves a CER of 3.42%,showcasing a 1.23%improvement over the baseline model.Similarly,on the LibriSpeech dataset,the CCAT model achieves a Word Error Rate(WER)of 5.27%,demonstrating a performance improvement of 7.67%compared to the baseline model.Additionally,this paper proposes an evaluation criterion for assessing the robustness of ATC speech recognition systems.In robustness evaluation experiments based on this criterion,the proposed model demonstrates a performance improvement of 22%compared to the baseline model.展开更多
Shape resonances of electron-molecule system formed in the low-energy electron attachment to four low-lying conformers of serine (serine 1, serine 2, serine 3, and serine 4) in gas phase are investigated using the q...Shape resonances of electron-molecule system formed in the low-energy electron attachment to four low-lying conformers of serine (serine 1, serine 2, serine 3, and serine 4) in gas phase are investigated using the quantum scattering method with the non-empirical model potentials in single-center expansion. In the attachment energy range of 0-10 eV, three shape resonances for serine 1, serine 2, and serine 4 and four shape resonances for serine 3 are predicted. The one-dimensional potential energy curves of the temporary negative ions of electron-serine are calculated to explore the correlations between the shape resonance and the bond cleavage. The bond-cleavage selectivity of the different resonant states for a certain conformer is demonstrated, and the recent experimental results about the dissociative electron attachment to serine are interpreted on the basis of present calculations.展开更多
Combining Raman spectroscopy with density functional theory, the populations of the trans- and gaucheethanol conformers are investigated in carbon tetrachloride (CC14) and carbon disulfide (CS2). The spectral cont...Combining Raman spectroscopy with density functional theory, the populations of the trans- and gaucheethanol conformers are investigated in carbon tetrachloride (CC14) and carbon disulfide (CS2). The spectral contributions of two ethanol conformers are identified in OH stretching region. The energy difference between both conformers is estimated with the aid of the calculated Raman cross sections. It can be seen that the trans- ethanol is more stable in CC14 and CS2 solutions. The spectra are also obtained at different temperatures, and it is found the van't Hoff analysis is invalid in these solutions. By taking accounts of the Boltzmann distribution and theoretical Raman cross section, the energy difference is found to be increased with temperature, which shows the weak intermolecular interactions can enhance the population of transethanol.展开更多
The outer-valence binding energy spectra of ethanol in the energy range of 9-21 eV are mea- sured by a high-resolution electron momentum spectrometer at an impact energy of 2.5 keV plus the binding energy. The electro...The outer-valence binding energy spectra of ethanol in the energy range of 9-21 eV are mea- sured by a high-resolution electron momentum spectrometer at an impact energy of 2.5 keV plus the binding energy. The electron momentum distributions for the ionization peaks cor- responding to the outer-valence orbitals are obtained by deconvoluting a series of azimuthal angular correlated binding energy spectra. Comparison is made with the theoretical calcu- lations for two conformers, trans and gauche, coexisting in the gas phase of ethanol at the level of B3LYP density functional theory with aug-cc-pVTZ basis sets. It is found that the measured electron momentum distributions for the peaks at 14.5 and 15.2 eV are in good agreement with the theoretical electron momentum distributions for the molecular orbitals of individual conformers (i.e., 8a' of trans and 9a of gauche), but not in accordance with the thermally averaged ones. It demonstrates that the high-resolution electron momentum spectrometer, by inspecting the molecular electronic structure, is a promising technique to identify different conformers in a mixed sample.展开更多
The geometry and the energy of conformers of PnHn+2(n=2-9) have been studied with PM3 method. It is concluded that gauche interaction between adjacent lone electron pairs and gauche interaction between P-H bond with a...The geometry and the energy of conformers of PnHn+2(n=2-9) have been studied with PM3 method. It is concluded that gauche interaction between adjacent lone electron pairs and gauche interaction between P-H bond with adjacent P-P bond are important for predicting the stable conformer of open-chain phosphoanes.展开更多
25,27-Dipropoxy-p-tert-butylcalix[4]crown-9 cone conformer 5 was readily synthesized via an intramolecular cyclization strategy in good yields. The structures of all the new compounds involved were confirmed by NMR, E...25,27-Dipropoxy-p-tert-butylcalix[4]crown-9 cone conformer 5 was readily synthesized via an intramolecular cyclization strategy in good yields. The structures of all the new compounds involved were confirmed by NMR, ESI-MS and elemental analyses. All of them were proved to be in the cone conformation.展开更多
文摘最近,基于自注意力的Transformer结构在不同领域的一系列任务上表现出非常好的性能。探索了基于Transformer编码器和LAS(listen,attend and spell)解码器的Transformer-LAS语音识别模型的效果,并针对Transformer不善于捕捉局部信息的问题,使用Conformer代替Transformer,提出Conformer-LAS模型。由于Attention过于灵活的对齐方式,使得在嘈杂环境中的效果急剧下降,采用连接时序分类(connectionist temporal classification,CTC)辅助训练以加快收敛,并加入音素级别的中间CTC损失联合优化,提出了效果更好的Conformer-LAS-CTC语音识别模型。在开源中文普通话Aishell-1数据集上对提出来的模型进行验证,实验结果表明,Conformer-LAS-CTC相对于采用的基线BLSTM-LAS和Transformer-LAS模型在测试集上的字错率分别相对降低了22.58%和48.76%,模型最终字错误率为4.54%。
文摘针对使用Conformer模型的语音识别算法在实际应用时设备算力不足及资源缺乏的问题,提出一种基于Conformer模型间隔剪枝和参数量化相结合的模型压缩方法。实验显示,使用该方法压缩后,模型的实时率(real time factor, RTF)达到0.107614,较基线模型的推理速度提升了16.2%,而识别准确率只下降了1.79%,并且模型大小也由原来的207.91MB下降到72.69MB。该方法在模型准确率损失很小的情况下,较大程度地提升了模型的适用性。
基金This study was co-supported by the National Key R&D Program of China(No.2021YFF0603904)National Natural Science Foundation of China(U1733203)Safety Capacity Building Project of Civil Aviation Administration of China(TM2019-16-1/3).
文摘This study aims to address the deviation in downstream tasks caused by inaccurate recognition results when applying Automatic Speech Recognition(ASR)technology in the Air Traffic Control(ATC)field.This paper presents a novel cascaded model architecture,namely Conformer-CTC/Attention-T5(CCAT),to build a highly accurate and robust ATC speech recognition model.To tackle the challenges posed by noise and fast speech rate in ATC,the Conformer model is employed to extract robust and discriminative speech representations from raw waveforms.On the decoding side,the Attention mechanism is integrated to facilitate precise alignment between input features and output characters.The Text-To-Text Transfer Transformer(T5)language model is also introduced to handle particular pronunciations and code-mixing issues,providing more accurate and concise textual output for downstream tasks.To enhance the model’s robustness,transfer learning and data augmentation techniques are utilized in the training strategy.The model’s performance is optimized by performing hyperparameter tunings,such as adjusting the number of attention heads,encoder layers,and the weights of the loss function.The experimental results demonstrate the significant contributions of data augmentation,hyperparameter tuning,and error correction models to the overall model performance.On the Our ATC Corpus dataset,the proposed model achieves a Character Error Rate(CER)of 3.44%,representing a 3.64%improvement compared to the baseline model.Moreover,the effectiveness of the proposed model is validated on two publicly available datasets.On the AISHELL-1 dataset,the CCAT model achieves a CER of 3.42%,showcasing a 1.23%improvement over the baseline model.Similarly,on the LibriSpeech dataset,the CCAT model achieves a Word Error Rate(WER)of 5.27%,demonstrating a performance improvement of 7.67%compared to the baseline model.Additionally,this paper proposes an evaluation criterion for assessing the robustness of ATC speech recognition systems.In robustness evaluation experiments based on this criterion,the proposed model demonstrates a performance improvement of 22%compared to the baseline model.
基金This work is supported by the National Natural Science Foundation of China (No.21303212 and No.21573209), the Ministry of Science and Technology of China (No.2013CB834602).
文摘Shape resonances of electron-molecule system formed in the low-energy electron attachment to four low-lying conformers of serine (serine 1, serine 2, serine 3, and serine 4) in gas phase are investigated using the quantum scattering method with the non-empirical model potentials in single-center expansion. In the attachment energy range of 0-10 eV, three shape resonances for serine 1, serine 2, and serine 4 and four shape resonances for serine 3 are predicted. The one-dimensional potential energy curves of the temporary negative ions of electron-serine are calculated to explore the correlations between the shape resonance and the bond cleavage. The bond-cleavage selectivity of the different resonant states for a certain conformer is demonstrated, and the recent experimental results about the dissociative electron attachment to serine are interpreted on the basis of present calculations.
文摘Combining Raman spectroscopy with density functional theory, the populations of the trans- and gaucheethanol conformers are investigated in carbon tetrachloride (CC14) and carbon disulfide (CS2). The spectral contributions of two ethanol conformers are identified in OH stretching region. The energy difference between both conformers is estimated with the aid of the calculated Raman cross sections. It can be seen that the trans- ethanol is more stable in CC14 and CS2 solutions. The spectra are also obtained at different temperatures, and it is found the van't Hoff analysis is invalid in these solutions. By taking accounts of the Boltzmann distribution and theoretical Raman cross section, the energy difference is found to be increased with temperature, which shows the weak intermolecular interactions can enhance the population of transethanol.
文摘The outer-valence binding energy spectra of ethanol in the energy range of 9-21 eV are mea- sured by a high-resolution electron momentum spectrometer at an impact energy of 2.5 keV plus the binding energy. The electron momentum distributions for the ionization peaks cor- responding to the outer-valence orbitals are obtained by deconvoluting a series of azimuthal angular correlated binding energy spectra. Comparison is made with the theoretical calcu- lations for two conformers, trans and gauche, coexisting in the gas phase of ethanol at the level of B3LYP density functional theory with aug-cc-pVTZ basis sets. It is found that the measured electron momentum distributions for the peaks at 14.5 and 15.2 eV are in good agreement with the theoretical electron momentum distributions for the molecular orbitals of individual conformers (i.e., 8a' of trans and 9a of gauche), but not in accordance with the thermally averaged ones. It demonstrates that the high-resolution electron momentum spectrometer, by inspecting the molecular electronic structure, is a promising technique to identify different conformers in a mixed sample.
文摘The geometry and the energy of conformers of PnHn+2(n=2-9) have been studied with PM3 method. It is concluded that gauche interaction between adjacent lone electron pairs and gauche interaction between P-H bond with adjacent P-P bond are important for predicting the stable conformer of open-chain phosphoanes.
基金the Research Foundation of Tongji Medical College,Huazhong University of Science and Technology(No.25514107)Project Chenguang of Wuhan City(No.200750731267)for financial support.
文摘25,27-Dipropoxy-p-tert-butylcalix[4]crown-9 cone conformer 5 was readily synthesized via an intramolecular cyclization strategy in good yields. The structures of all the new compounds involved were confirmed by NMR, ESI-MS and elemental analyses. All of them were proved to be in the cone conformation.