期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
Audio-Text Multimodal Speech Recognition via Dual-Tower Architecture for Mandarin Air Traffic Control Communications
1
作者 Shuting Ge Jin Ren +3 位作者 Yihua Shi Yujun Zhang Shunzhi Yang Jinfeng Yang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3215-3245,共31页
In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a p... In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management. 展开更多
关键词 Speech-text multimodal automatic speech recognition semantic alignment air traffic control communications dual-tower architecture
下载PDF
A GAN based method for multiple prohibited items synthesis of X-ray security image 被引量:1
2
作者 李大双 胡小兵 +1 位作者 张海刚 杨金锋 《Optoelectronics Letters》 EI 2021年第2期112-117,共6页
Detecting prohibited item based on convolutional neural networks(CNNs) is of great significance to ensure public safety. However, the natural occurrence of such prohibited items is a small-probability event, collectin... Detecting prohibited item based on convolutional neural networks(CNNs) is of great significance to ensure public safety. However, the natural occurrence of such prohibited items is a small-probability event, collecting enough datasets to support CNN training is a big challenge. In this paper, we propose a new method for synthesizing X-ray security image with multiple prohibited items from semantic label images basing on Generative Adversarial Networks(GANs). Theoretically, we can use it to synthesize as many X-ray images as needed. A new generator architecture with Res 2 Net is presented, which is more effective in learning multi-scale features of different prohibited items images. This method is extended by establishing the semantic label library which contains 14 000 images. So we totally synthesize 14 000 Xray security images. The experimental results show the super performance(Fréchet Inception Distance(FID) score of 30.55). And we achieve 0.825 of mean average precision(m AP) with Single Shot Multi Box Detector(SSD) for object detection, demonstrating the effectiveness of our approach. 展开更多
关键词 IMAGE method SYNTHESIS
原文传递
A prohibited items identification approach based on semantic segmentation
3
作者 姚少卿 苏志刚 +1 位作者 杨金锋 张海刚 《Optoelectronics Letters》 EI 2021年第4期247-251,共5页
Deep learning(DL) based semantic segmentation methods can extract object information including category, location and shape. In this paper, the identification of prohibited items is regarded as a task of semantic segm... Deep learning(DL) based semantic segmentation methods can extract object information including category, location and shape. In this paper, the identification of prohibited items is regarded as a task of semantic segmentation, and proposes a universal model with automatic identification of prohibited items. This model has two improvements based on the general semantic segmentation network. Firstly, the N-type encoding structure is applied to enlarge the receptive field of the network aiming at reducing the misclassification. Secondly, consider the lack of surface texture in X-ray security images. Inspired by feature reuse in Densenet, shallow semantic information is reused to improve the segmentation accuracy. With the use of this model, when using input images of size 512× 512, we could achieve 0.783 mean intersection over union(m Io U) for a seven-class object recognition problem. 展开更多
关键词 SEMANTIC UNION SEGMENTATION
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部