In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a p...In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management.展开更多
Visible light communication between light emitting diode(LED) traffic lights and vehicles with a receiving photodiode front-end is developed for intelligent transportation systems. In this letter, the communication da...Visible light communication between light emitting diode(LED) traffic lights and vehicles with a receiving photodiode front-end is developed for intelligent transportation systems. In this letter, the communication data rates for different ranges are evaluated. The data rates are based on real scenarios of the background noise and path losses and are experimentally obtained with a testing system built upon commercial offthe-shelf components. Comparisons of range-rate performance for different average noise levels are also conducted with the use of red/yellow/green LED lights. Results show that achieving the data rates of kilobits per second at a communication range of hundred meters is possible under the ordinary noise scenario, a finding that is highly significant for practical applications.展开更多
Tor is pervasively used to conceal target websites that users are visiting. A de-anonymization technique against Tor, referred to as website fingerprinting attack, aims to infer the websites accessed by Tor clients by...Tor is pervasively used to conceal target websites that users are visiting. A de-anonymization technique against Tor, referred to as website fingerprinting attack, aims to infer the websites accessed by Tor clients by passively analyzing the patterns of encrypted traffic at the Tor client side. However, HTTP pipeline and Tor circuit multiplexing techniques can affect the accuracy of the attack by mixing the traffic that carries web objects in a single TCP connection. In this paper, we propose a novel active website fingerprinting attack by identifying and delaying the HTTP requests at the first hop Tor node. Then, we can separate the traffic that carries distinct web objects to derive a more distinguishable traffic pattern. To fulfill this goal, two algorithms based on statistical analysis and objective function optimization are proposed to construct a general packet delay scheme. We evaluate our active attack against Tor in empirical experiments and obtain the highest accuracy of 98.64%, compared with 85.95% of passive attack. We also perform experiments in the open-world scenario. When the parameter k of k-NN classifier is set to 5, then we can obtain a true positive rate of 90.96% with a false positive rate of 3.9%.展开更多
基金This research was funded by Shenzhen Science and Technology Program(Grant No.RCBS20221008093121051)the General Higher Education Project of Guangdong Provincial Education Department(Grant No.2020ZDZX3085)+1 种基金China Postdoctoral Science Foundation(Grant No.2021M703371)the Post-Doctoral Foundation Project of Shenzhen Polytechnic(Grant No.6021330002K).
文摘In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management.
基金a part of the programs,"Technology and Applications of Visible Light Communication with LEDs" and "Application Study for LED Informationization Technology,"which were both supported by the Scientific Development and Innovation Collection Fund of Shenzhen
文摘Visible light communication between light emitting diode(LED) traffic lights and vehicles with a receiving photodiode front-end is developed for intelligent transportation systems. In this letter, the communication data rates for different ranges are evaluated. The data rates are based on real scenarios of the background noise and path losses and are experimentally obtained with a testing system built upon commercial offthe-shelf components. Comparisons of range-rate performance for different average noise levels are also conducted with the use of red/yellow/green LED lights. Results show that achieving the data rates of kilobits per second at a communication range of hundred meters is possible under the ordinary noise scenario, a finding that is highly significant for practical applications.
基金partially supported by the National Key R&D Program of China(No.2017YFB1003000)the National Natural Science Foundation of China(Nos.61572130,61320106007,61632008,61502100,61532013,and 61402104)+3 种基金the Jiangsu Provincial Natural Science Foundation(No.BK20150637)the Jiangsu Provincial Key Technology R&D Program(No.BE2014603)the Qing Lan Project of Jiangsu Province,Jiangsu Provincial Key Laboratory of Network and Information Security(No.BM2003201)the Key Laboratory of Computer Network and Information Integration of the Ministry of Education of China(No.93K-9)
文摘Tor is pervasively used to conceal target websites that users are visiting. A de-anonymization technique against Tor, referred to as website fingerprinting attack, aims to infer the websites accessed by Tor clients by passively analyzing the patterns of encrypted traffic at the Tor client side. However, HTTP pipeline and Tor circuit multiplexing techniques can affect the accuracy of the attack by mixing the traffic that carries web objects in a single TCP connection. In this paper, we propose a novel active website fingerprinting attack by identifying and delaying the HTTP requests at the first hop Tor node. Then, we can separate the traffic that carries distinct web objects to derive a more distinguishable traffic pattern. To fulfill this goal, two algorithms based on statistical analysis and objective function optimization are proposed to construct a general packet delay scheme. We evaluate our active attack against Tor in empirical experiments and obtain the highest accuracy of 98.64%, compared with 85.95% of passive attack. We also perform experiments in the open-world scenario. When the parameter k of k-NN classifier is set to 5, then we can obtain a true positive rate of 90.96% with a false positive rate of 3.9%.