Exploiting the source-to-relay channel phase information at the relays can increase the rate upper-bound of distributed orthogonal space-time block codes(STBC)from 2/K to 1/2,where Kis the number of relays.This techni...Exploiting the source-to-relay channel phase information at the relays can increase the rate upper-bound of distributed orthogonal space-time block codes(STBC)from 2/K to 1/2,where Kis the number of relays.This technique is known as distributed orthogonal space-time block codes with channel phase information(DOSTBC-CPI).However,the decoding delay of existing DOSTBC-CPIs is not optimal.Therefore,based on the rate of 1/2 balanced complex orthogonal design(COD),an algorithm is provided to construct a maximal rate DOSTBC-CPI with only half the decoding delay of existing DOSTBC-CPI.Simulation results show that the proposed method exhibits lower symbol error rate than the existing DOSTBC-CPIs.展开更多
This paper utilizes uniquely decodable codes[UDCs]in an M-to-1 free-space optical[FSO]system.Benefiting from UDCs’nonorthogonal nature,the sum throughput is improved.We first prove that the uniquely decodable propert...This paper utilizes uniquely decodable codes[UDCs]in an M-to-1 free-space optical[FSO]system.Benefiting from UDCs’nonorthogonal nature,the sum throughput is improved.We first prove that the uniquely decodable property still holds,even in optical fading channels.It is further discovered that the receiver can extract each source’s data from superimposed symbols with only one processing unit.According to theoretical analysis and simulation results,the throughput gain is up to the normalized UDC’s sum rate in high signal-to-noise ratio cases.An equivalent desktop experiment is also implemented to show the feasibility of the UDC-FSO structure.展开更多
Constituted by BCH component codes and its ordered statistics decoding(OSD),the successive cancellation list(SCL)decoding of U-UV structural codes can provide competent error-correction performance in the short-to-med...Constituted by BCH component codes and its ordered statistics decoding(OSD),the successive cancellation list(SCL)decoding of U-UV structural codes can provide competent error-correction performance in the short-to-medium length regime.However,this list decoding complexity becomes formidable as the decoding output list size increases.This is primarily incurred by the OSD.Addressing this challenge,this paper proposes the low complexity SCL decoding through reducing the complexity of component code decoding,and pruning the redundant SCL decoding paths.For the former,an efficient skipping rule is introduced for the OSD so that the higher order decoding can be skipped when they are not possible to provide a more likely codeword candidate.It is further extended to the OSD variant,the box-andmatch algorithm(BMA),in facilitating the component code decoding.Moreover,through estimating the correlation distance lower bounds(CDLBs)of the component code decoding outputs,a path pruning(PP)-SCL decoding is proposed to further facilitate the decoding of U-UV codes.In particular,its integration with the improved OSD and BMA is discussed.Simulation results show that significant complexity reduction can be achieved.Consequently,the U-UV codes can outperform the cyclic redundancy check(CRC)-polar codes with a similar decoding complexity.展开更多
This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Syste...This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.展开更多
This letter proposes a sliced-gated-convolutional neural network with belief propagation(SGCNN-BP) architecture for decoding long codes under correlated noise. The basic idea of SGCNNBP is using Neural Networks(NN) to...This letter proposes a sliced-gated-convolutional neural network with belief propagation(SGCNN-BP) architecture for decoding long codes under correlated noise. The basic idea of SGCNNBP is using Neural Networks(NN) to transform the correlated noise into white noise, setting up the optimal condition for a standard BP decoder that takes the output from the NN. A gate-controlled neuron is used to regulate information flow and an optional operation—slicing is adopted to reduce parameters and lower training complexity. Simulation results show that SGCNN-BP has much better performance(with the largest gap being 5dB improvement) than a single BP decoder and achieves a nearly 1dB improvement compared to Fully Convolutional Networks(FCN).展开更多
In this paper,we innovatively associate the mutual information with the frame error rate(FER)performance and propose novel quantized decoders for polar codes.Based on the optimal quantizer of binary-input discrete mem...In this paper,we innovatively associate the mutual information with the frame error rate(FER)performance and propose novel quantized decoders for polar codes.Based on the optimal quantizer of binary-input discrete memoryless channels(BDMCs),the proposed decoders quantize the virtual subchannels of polar codes to maximize mutual information(MMI)between source bits and quantized symbols.The nested structure of polar codes ensures that the MMI quantization can be implemented stage by stage.Simulation results show that the proposed MMI decoders with 4 quantization bits outperform the existing nonuniform quantized decoders that minimize mean-squared error(MMSE)with 4 quantization bits,and yield even better performance than uniform MMI quantized decoders with 5 quantization bits.Furthermore,the proposed 5-bit quantized MMI decoders approach the floating-point decoders with negligible performance loss.展开更多
Normally,in the downlink Network-Coded Multiple Access(NCMA)system,the same power is allocated to different users.However,equal power allocation is unsuitable for some scenarios,such as when user devices have differen...Normally,in the downlink Network-Coded Multiple Access(NCMA)system,the same power is allocated to different users.However,equal power allocation is unsuitable for some scenarios,such as when user devices have different Quality of Service(QoS)requirements.Hence,we study the power allocation in the downlink NCMA system in this paper,and propose a downlink Network-Coded Multiple Access with Diverse Power(NCMA-DP),wherein different amounts of power are allocated to different users.In terms of the Bit Error Rate(BER)of the multi-user decoder,and the number of packets required to correctly decode the message,the performance of the user with more allocated power is greatly improved compared to the Conventional NCMA(NCMA-C).Meanwhile,the performance of the user with less allocated power is still much better than NCMA-C.Furthermore,the overall throughput of NCMA-DP is greatly improved compared to that of NCMA-C.The simulation results demonstrate the remarkable performance of the proposed NCMA-DP.展开更多
The process of generating descriptive captions for images has witnessed significant advancements in last years,owing to the progress in deep learning techniques.Despite significant advancements,the task of thoroughly ...The process of generating descriptive captions for images has witnessed significant advancements in last years,owing to the progress in deep learning techniques.Despite significant advancements,the task of thoroughly grasping image content and producing coherent,contextually relevant captions continues to pose a substantial challenge.In this paper,we introduce a novel multimodal method for image captioning by integrating three powerful deep learning architectures:YOLOv8(You Only Look Once)for robust object detection,EfficientNetB7 for efficient feature extraction,and Transformers for effective sequence modeling.Our proposed model combines the strengths of YOLOv8 in detecting objects,the superior feature representation capabilities of EfficientNetB7,and the contextual understanding and sequential generation abilities of Transformers.We conduct extensive experiments on standard benchmark datasets to evaluate the effectiveness of our approach,demonstrating its ability to generate informative and semantically rich captions for diverse images.The experimental results showcase the synergistic benefits of integrating YOLOv8,EfficientNetB7,and Transformers in advancing the state-of-the-art in image captioning tasks.The proposed multimodal approach has yielded impressive outcomes,generating informative and semantically rich captions for a diverse range of images.By combining the strengths of YOLOv8,EfficientNetB7,and Transformers,the model has achieved state-of-the-art results in image captioning tasks.The significance of this approach lies in its ability to address the challenging task of generating coherent and contextually relevant captions while achieving a comprehensive understanding of image content.The integration of three powerful deep learning architectures demonstrates the synergistic benefits of multimodal fusion in advancing the state-of-the-art in image captioning.Furthermore,this approach has a profound impact on the field,opening up new avenues for research in multimodal deep learning and paving the way for more sophisticated and context-aware image captioning systems.These systems have the potential to make significant contributions to various fields,encompassing human-computer interaction,computer vision and natural language processing.展开更多
In the video captioning methods based on an encoder-decoder,limited visual features are extracted by an encoder,and a natural sentence of the video content is generated using a decoder.However,this kind ofmethod is de...In the video captioning methods based on an encoder-decoder,limited visual features are extracted by an encoder,and a natural sentence of the video content is generated using a decoder.However,this kind ofmethod is dependent on a single video input source and few visual labels,and there is a problem with semantic alignment between video contents and generated natural sentences,which are not suitable for accurately comprehending and describing the video contents.To address this issue,this paper proposes a video captioning method by semantic topic-guided generation.First,a 3D convolutional neural network is utilized to extract the spatiotemporal features of videos during the encoding.Then,the semantic topics of video data are extracted using the visual labels retrieved from similar video data.In the decoding,a decoder is constructed by combining a novel Enhance-TopK sampling algorithm with a Generative Pre-trained Transformer-2 deep neural network,which decreases the influence of“deviation”in the semantic mapping process between videos and texts by jointly decoding a baseline and semantic topics of video contents.During this process,the designed Enhance-TopK sampling algorithm can alleviate a long-tail problem by dynamically adjusting the probability distribution of the predicted words.Finally,the experiments are conducted on two publicly used Microsoft Research Video Description andMicrosoft Research-Video to Text datasets.The experimental results demonstrate that the proposed method outperforms several state-of-art approaches.Specifically,the performance indicators Bilingual Evaluation Understudy,Metric for Evaluation of Translation with Explicit Ordering,Recall Oriented Understudy for Gisting Evaluation-longest common subsequence,and Consensus-based Image Description Evaluation of the proposed method are improved by 1.2%,0.1%,0.3%,and 2.4% on the Microsoft Research Video Description dataset,and 0.1%,1.0%,0.1%,and 2.8% on the Microsoft Research-Video to Text dataset,respectively,compared with the existing video captioning methods.As a result,the proposed method can generate video captioning that is more closely aligned with human natural language expression habits.展开更多
Belief propagation list(BPL) decoding for polar codes has attracted more attention due to its inherent parallel nature. However, a large gap still exists with CRC-aided SCL(CA-SCL) decoding.In this work, an improved s...Belief propagation list(BPL) decoding for polar codes has attracted more attention due to its inherent parallel nature. However, a large gap still exists with CRC-aided SCL(CA-SCL) decoding.In this work, an improved segmented belief propagation list decoding based on bit flipping(SBPL-BF) is proposed. On the one hand, the proposed algorithm makes use of the cooperative characteristic in BPL decoding such that the codeword is decoded in different BP decoders. Based on this characteristic, the unreliable bits for flipping could be split into multiple subblocks and could be flipped in different decoders simultaneously. On the other hand, a more flexible and effective processing strategy for the priori information of the unfrozen bits that do not need to be flipped is designed to improve the decoding convergence. In addition, this is the first proposal in BPL decoding which jointly optimizes the bit flipping of the information bits and the code bits. In particular, for bit flipping of the code bits, a H-matrix aided bit-flipping algorithm is designed to enhance the accuracy in identifying erroneous code bits. The simulation results show that the proposed algorithm significantly improves the errorcorrection performance of BPL decoding for medium and long codes. It is more than 0.25 d B better than the state-of-the-art BPL decoding at a block error rate(BLER) of 10^(-5), and outperforms CA-SCL decoding in the low signal-to-noise(SNR) region for(1024, 0.5)polar codes.展开更多
Increasing research has focused on semantic communication,the goal of which is to convey accurately the meaning instead of transmitting symbols from the sender to the receiver.In this paper,we design a novel encoding ...Increasing research has focused on semantic communication,the goal of which is to convey accurately the meaning instead of transmitting symbols from the sender to the receiver.In this paper,we design a novel encoding and decoding semantic communication framework,which adopts the semantic information and the contextual correlations between items to optimize the performance of a communication system over various channels.On the sender side,the average semantic loss caused by the wrong detection is defined,and a semantic source encoding strategy is developed to minimize the average semantic loss.To further improve communication reliability,a decoding strategy that utilizes the semantic and the context information to recover messages is proposed in the receiver.Extensive simulation results validate the superior performance of our strategies over state-of-the-art semantic coding and decoding policies on different communication channels.展开更多
The demand for high-data-rate underwater acoustic communications(UACs)in marine development is increasing;however,severe multipaths make demodulation a challenge.The decision feedback equalizer(DFE)is one of the most ...The demand for high-data-rate underwater acoustic communications(UACs)in marine development is increasing;however,severe multipaths make demodulation a challenge.The decision feedback equalizer(DFE)is one of the most popular equalizers in UAC;however,it is not the optimal algorithm.Although maximum likelihood sequence estimation(MLSE)is the optimal algorithm,its complexity increases exponentially with the number of channel taps,making it challenging to apply to UAC.Therefore,this paper proposes a complexity-reduced MLSE to improve the bit error rate(BER)performance in multipath channels.In the proposed algorithm,the original channel is first shortened using a channel-shortening method,and several dominant channel taps are selected for MLSE.Subsequently,sphere decoding(SD)is performed in the following MLSE.Iterations are applied to eliminate inter-symbol interference caused by weak channel taps.The simulation and sea experiment demonstrate the superiority of the proposed algorithm.The simulation results show that channel shortening combined with SD can drastically reduce computational complexity,and iterative SD performs better than DFE based on recursive least squares(RLS-DFE),DFE based on improved proportionate normalized least mean squares(IPNLMS-DFE),and channel estimation-based DFE(CE-DFE).Moreover,the sea experimental results at Zhairuoshan Island in Zhoushan show that the proposed receiver scheme has improved BER performance over RLSDFE,IPNLMS-DFE,and CE-DFE.Compared with the RLS-DFE,the BER,after five iterations,is reduced from 0.0076 to 0.0037 in the 8–12 k Hz band and from 0.1516 to 0.1145 in the 13–17 k Hz band at a distance of 2000 m.Thus,the proposed algorithm makes it possible to apply MLSE in UAC in practical scenarios.展开更多
The"Decoding Zhonghua"International Conference on Dialogue among Civilisations,hosted by China International Public Relations Association,China Ethnic News and Academy of Contemporary China and World Studies...The"Decoding Zhonghua"International Conference on Dialogue among Civilisations,hosted by China International Public Relations Association,China Ethnic News and Academy of Contemporary China and World Studies was held in Beijing on January 17th.With the theme"Pursing Harmonious Coexistence of Civilisations through Dialogue".展开更多
In this paper, both the high-complexity near-ML list decoding and the low-complexity belief propagation decoding are tested for some well-known regular and irregular LDPC codes. The complexity and performance trade-of...In this paper, both the high-complexity near-ML list decoding and the low-complexity belief propagation decoding are tested for some well-known regular and irregular LDPC codes. The complexity and performance trade-off is shown clearly and demonstrated with the paradigm of hybrid decoding. For regular LDPC code, the SNR-threshold performance and error-floor performance could be improved to the optimal level of ML decoding if the decoding complexity is progressively increased, usually corresponding to the near-ML decoding with progressively increased size of list. For irregular LDPC code, the SNR-threshold performance and error-floor performance could only be improved to a bottle-neck even with unlimited decoding complexity. However, with the technique of CRC-aided hybrid decoding, the ML performance could be greatly improved and approached with reasonable complexity thanks to the improved code-weight distribution from the concatenation of CRC and irregular LDPC code. Finally, CRC-aided 5GNR-LDPC code is evaluated and the capacity-approaching capability is shown.展开更多
A global optimization algorithm (GOA) for parallel Chien search circuit in Reed-Solomon (RS) (255,239) decoder is presented. By finding out the common modulo 2 additions within groups of Galois field (GF) mult...A global optimization algorithm (GOA) for parallel Chien search circuit in Reed-Solomon (RS) (255,239) decoder is presented. By finding out the common modulo 2 additions within groups of Galois field (GF) multipliers and pre-computing the common items, the GOA can reduce the number of XOR gates efficiently and thus reduce the circuit area. Different from other local optimization algorithms, the GOA is a global one. When there are more than one maximum matches at a time, the best match choice in the GOA has the least impact on the final result by only choosing the pair with the smallest relational value instead of choosing a pair randomly. The results show that the area of parallel Chien search circuits can be reduced by 51% compared to the direct implementation when the group-based GOA is used for GF multipliers and by 26% if applying the GOA to GF multipliers separately. This optimization scheme can be widely used in general parallel architecture in which many GF multipliers are involved.展开更多
To improve the performance of the short interleaved serial concatenated convolutional code(SCCC) with low decoding iterative times, the structure of Log MAP algorithm is introduced into the conventional SOVA decoder...To improve the performance of the short interleaved serial concatenated convolutional code(SCCC) with low decoding iterative times, the structure of Log MAP algorithm is introduced into the conventional SOVA decoder to improve its performance at short interleaving delay. The combination of Log MAP and SOVA avoids updating the matrices of the maximum path, and also makes a contribution to the requirement of short delay. The simulation results of several SCCCs show that the improved decoder can obtain satisfied performance with short frame interleaver and it is suitable to the high bit rate low delay communication systems.展开更多
The first domestic total dose hardened 2μm partially depleted silicon-on-insulator (PDSOI) CMOS 3-line to 8- line decoder fabricated in SIMOX is demonstrated. The radiation performance is characterized by transisto...The first domestic total dose hardened 2μm partially depleted silicon-on-insulator (PDSOI) CMOS 3-line to 8- line decoder fabricated in SIMOX is demonstrated. The radiation performance is characterized by transistor threshold voltage shifts,circuit static leakage currents,and I-V curves as a function of total dose up to 3× 10^5rad(Si). The worst case threshold voltage shifts of the front channels are less than 20mV for nMOS transistors at 3 × 10^5rad(Si) and follow-up irradiation and less than 70mV for the pMOS transistors. Furthermore, no significant radiation induced leakage currents and functional degeneration are observed.展开更多
基金supported in part by the National Natural Science Foundation of China(Nos.61271230,61472190)the National Mobile Communications Research Laboratory,Southeast University(No.2013D02)
文摘Exploiting the source-to-relay channel phase information at the relays can increase the rate upper-bound of distributed orthogonal space-time block codes(STBC)from 2/K to 1/2,where Kis the number of relays.This technique is known as distributed orthogonal space-time block codes with channel phase information(DOSTBC-CPI).However,the decoding delay of existing DOSTBC-CPIs is not optimal.Therefore,based on the rate of 1/2 balanced complex orthogonal design(COD),an algorithm is provided to construct a maximal rate DOSTBC-CPI with only half the decoding delay of existing DOSTBC-CPI.Simulation results show that the proposed method exhibits lower symbol error rate than the existing DOSTBC-CPIs.
基金supported in part by the National Natural Science Foundation of China(No.62101527)in part by the Funding Program of Innovation Labs by CIOMP。
文摘This paper utilizes uniquely decodable codes[UDCs]in an M-to-1 free-space optical[FSO]system.Benefiting from UDCs’nonorthogonal nature,the sum throughput is improved.We first prove that the uniquely decodable property still holds,even in optical fading channels.It is further discovered that the receiver can extract each source’s data from superimposed symbols with only one processing unit.According to theoretical analysis and simulation results,the throughput gain is up to the normalized UDC’s sum rate in high signal-to-noise ratio cases.An equivalent desktop experiment is also implemented to show the feasibility of the UDC-FSO structure.
基金supported by the National Natural Science Foundation of China(NSFC)with project ID 62071498the Guangdong National Science Foundation(GDNSF)with project ID 2024A1515010213.
文摘Constituted by BCH component codes and its ordered statistics decoding(OSD),the successive cancellation list(SCL)decoding of U-UV structural codes can provide competent error-correction performance in the short-to-medium length regime.However,this list decoding complexity becomes formidable as the decoding output list size increases.This is primarily incurred by the OSD.Addressing this challenge,this paper proposes the low complexity SCL decoding through reducing the complexity of component code decoding,and pruning the redundant SCL decoding paths.For the former,an efficient skipping rule is introduced for the OSD so that the higher order decoding can be skipped when they are not possible to provide a more likely codeword candidate.It is further extended to the OSD variant,the box-andmatch algorithm(BMA),in facilitating the component code decoding.Moreover,through estimating the correlation distance lower bounds(CDLBs)of the component code decoding outputs,a path pruning(PP)-SCL decoding is proposed to further facilitate the decoding of U-UV codes.In particular,its integration with the improved OSD and BMA is discussed.Simulation results show that significant complexity reduction can be achieved.Consequently,the U-UV codes can outperform the cyclic redundancy check(CRC)-polar codes with a similar decoding complexity.
基金supported by the Fundamental Research Funds for the Central Universities(FRF-TP20-062A1)Guangdong Basic and Applied Basic Research Foundation(2021A1515110070)。
文摘This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.
基金supported by Beijing Natural Science Foundation (L202003)。
文摘This letter proposes a sliced-gated-convolutional neural network with belief propagation(SGCNN-BP) architecture for decoding long codes under correlated noise. The basic idea of SGCNNBP is using Neural Networks(NN) to transform the correlated noise into white noise, setting up the optimal condition for a standard BP decoder that takes the output from the NN. A gate-controlled neuron is used to regulate information flow and an optional operation—slicing is adopted to reduce parameters and lower training complexity. Simulation results show that SGCNN-BP has much better performance(with the largest gap being 5dB improvement) than a single BP decoder and achieves a nearly 1dB improvement compared to Fully Convolutional Networks(FCN).
基金financially supported in part by National Key R&D Program of China(No.2018YFB1801402)in part by Huawei Technologies Co.,Ltd.
文摘In this paper,we innovatively associate the mutual information with the frame error rate(FER)performance and propose novel quantized decoders for polar codes.Based on the optimal quantizer of binary-input discrete memoryless channels(BDMCs),the proposed decoders quantize the virtual subchannels of polar codes to maximize mutual information(MMI)between source bits and quantized symbols.The nested structure of polar codes ensures that the MMI quantization can be implemented stage by stage.Simulation results show that the proposed MMI decoders with 4 quantization bits outperform the existing nonuniform quantized decoders that minimize mean-squared error(MMSE)with 4 quantization bits,and yield even better performance than uniform MMI quantized decoders with 5 quantization bits.Furthermore,the proposed 5-bit quantized MMI decoders approach the floating-point decoders with negligible performance loss.
文摘Normally,in the downlink Network-Coded Multiple Access(NCMA)system,the same power is allocated to different users.However,equal power allocation is unsuitable for some scenarios,such as when user devices have different Quality of Service(QoS)requirements.Hence,we study the power allocation in the downlink NCMA system in this paper,and propose a downlink Network-Coded Multiple Access with Diverse Power(NCMA-DP),wherein different amounts of power are allocated to different users.In terms of the Bit Error Rate(BER)of the multi-user decoder,and the number of packets required to correctly decode the message,the performance of the user with more allocated power is greatly improved compared to the Conventional NCMA(NCMA-C).Meanwhile,the performance of the user with less allocated power is still much better than NCMA-C.Furthermore,the overall throughput of NCMA-DP is greatly improved compared to that of NCMA-C.The simulation results demonstrate the remarkable performance of the proposed NCMA-DP.
基金funded by Researchers Supporting Project number(RSPD2024R698),King Saud University,Riyadh,Saudi Arabia.
文摘The process of generating descriptive captions for images has witnessed significant advancements in last years,owing to the progress in deep learning techniques.Despite significant advancements,the task of thoroughly grasping image content and producing coherent,contextually relevant captions continues to pose a substantial challenge.In this paper,we introduce a novel multimodal method for image captioning by integrating three powerful deep learning architectures:YOLOv8(You Only Look Once)for robust object detection,EfficientNetB7 for efficient feature extraction,and Transformers for effective sequence modeling.Our proposed model combines the strengths of YOLOv8 in detecting objects,the superior feature representation capabilities of EfficientNetB7,and the contextual understanding and sequential generation abilities of Transformers.We conduct extensive experiments on standard benchmark datasets to evaluate the effectiveness of our approach,demonstrating its ability to generate informative and semantically rich captions for diverse images.The experimental results showcase the synergistic benefits of integrating YOLOv8,EfficientNetB7,and Transformers in advancing the state-of-the-art in image captioning tasks.The proposed multimodal approach has yielded impressive outcomes,generating informative and semantically rich captions for a diverse range of images.By combining the strengths of YOLOv8,EfficientNetB7,and Transformers,the model has achieved state-of-the-art results in image captioning tasks.The significance of this approach lies in its ability to address the challenging task of generating coherent and contextually relevant captions while achieving a comprehensive understanding of image content.The integration of three powerful deep learning architectures demonstrates the synergistic benefits of multimodal fusion in advancing the state-of-the-art in image captioning.Furthermore,this approach has a profound impact on the field,opening up new avenues for research in multimodal deep learning and paving the way for more sophisticated and context-aware image captioning systems.These systems have the potential to make significant contributions to various fields,encompassing human-computer interaction,computer vision and natural language processing.
基金supported in part by the National Natural Science Foundation of China under Grant 61873277in part by the Natural Science Basic Research Plan in Shaanxi Province of China underGrant 2020JQ-758in part by the Chinese Postdoctoral Science Foundation under Grant 2020M673446.
文摘In the video captioning methods based on an encoder-decoder,limited visual features are extracted by an encoder,and a natural sentence of the video content is generated using a decoder.However,this kind ofmethod is dependent on a single video input source and few visual labels,and there is a problem with semantic alignment between video contents and generated natural sentences,which are not suitable for accurately comprehending and describing the video contents.To address this issue,this paper proposes a video captioning method by semantic topic-guided generation.First,a 3D convolutional neural network is utilized to extract the spatiotemporal features of videos during the encoding.Then,the semantic topics of video data are extracted using the visual labels retrieved from similar video data.In the decoding,a decoder is constructed by combining a novel Enhance-TopK sampling algorithm with a Generative Pre-trained Transformer-2 deep neural network,which decreases the influence of“deviation”in the semantic mapping process between videos and texts by jointly decoding a baseline and semantic topics of video contents.During this process,the designed Enhance-TopK sampling algorithm can alleviate a long-tail problem by dynamically adjusting the probability distribution of the predicted words.Finally,the experiments are conducted on two publicly used Microsoft Research Video Description andMicrosoft Research-Video to Text datasets.The experimental results demonstrate that the proposed method outperforms several state-of-art approaches.Specifically,the performance indicators Bilingual Evaluation Understudy,Metric for Evaluation of Translation with Explicit Ordering,Recall Oriented Understudy for Gisting Evaluation-longest common subsequence,and Consensus-based Image Description Evaluation of the proposed method are improved by 1.2%,0.1%,0.3%,and 2.4% on the Microsoft Research Video Description dataset,and 0.1%,1.0%,0.1%,and 2.8% on the Microsoft Research-Video to Text dataset,respectively,compared with the existing video captioning methods.As a result,the proposed method can generate video captioning that is more closely aligned with human natural language expression habits.
基金funded by the Key Project of NSFC-Guangdong Province Joint Program(Grant No.U2001204)the National Natural Science Foundation of China(Grant Nos.61873290 and 61972431)+1 种基金the Science and Technology Program of Guangzhou,China(Grant No.202002030470)the Funding Project of Featured Major of Guangzhou Xinhua University(2021TZ002).
文摘Belief propagation list(BPL) decoding for polar codes has attracted more attention due to its inherent parallel nature. However, a large gap still exists with CRC-aided SCL(CA-SCL) decoding.In this work, an improved segmented belief propagation list decoding based on bit flipping(SBPL-BF) is proposed. On the one hand, the proposed algorithm makes use of the cooperative characteristic in BPL decoding such that the codeword is decoded in different BP decoders. Based on this characteristic, the unreliable bits for flipping could be split into multiple subblocks and could be flipped in different decoders simultaneously. On the other hand, a more flexible and effective processing strategy for the priori information of the unfrozen bits that do not need to be flipped is designed to improve the decoding convergence. In addition, this is the first proposal in BPL decoding which jointly optimizes the bit flipping of the information bits and the code bits. In particular, for bit flipping of the code bits, a H-matrix aided bit-flipping algorithm is designed to enhance the accuracy in identifying erroneous code bits. The simulation results show that the proposed algorithm significantly improves the errorcorrection performance of BPL decoding for medium and long codes. It is more than 0.25 d B better than the state-of-the-art BPL decoding at a block error rate(BLER) of 10^(-5), and outperforms CA-SCL decoding in the low signal-to-noise(SNR) region for(1024, 0.5)polar codes.
基金supported in part by the National Natural Science Foundation of China under Grant No.61931020,U19B2024,62171449,62001483in part by the science and technology innovation Program of Hunan Province under Grant No.2021JJ40690。
文摘Increasing research has focused on semantic communication,the goal of which is to convey accurately the meaning instead of transmitting symbols from the sender to the receiver.In this paper,we design a novel encoding and decoding semantic communication framework,which adopts the semantic information and the contextual correlations between items to optimize the performance of a communication system over various channels.On the sender side,the average semantic loss caused by the wrong detection is defined,and a semantic source encoding strategy is developed to minimize the average semantic loss.To further improve communication reliability,a decoding strategy that utilizes the semantic and the context information to recover messages is proposed in the receiver.Extensive simulation results validate the superior performance of our strategies over state-of-the-art semantic coding and decoding policies on different communication channels.
基金Supported by the National Natural Science Foundation of China under Grant Nos. 62101489, 62171405 and 62225114.
文摘The demand for high-data-rate underwater acoustic communications(UACs)in marine development is increasing;however,severe multipaths make demodulation a challenge.The decision feedback equalizer(DFE)is one of the most popular equalizers in UAC;however,it is not the optimal algorithm.Although maximum likelihood sequence estimation(MLSE)is the optimal algorithm,its complexity increases exponentially with the number of channel taps,making it challenging to apply to UAC.Therefore,this paper proposes a complexity-reduced MLSE to improve the bit error rate(BER)performance in multipath channels.In the proposed algorithm,the original channel is first shortened using a channel-shortening method,and several dominant channel taps are selected for MLSE.Subsequently,sphere decoding(SD)is performed in the following MLSE.Iterations are applied to eliminate inter-symbol interference caused by weak channel taps.The simulation and sea experiment demonstrate the superiority of the proposed algorithm.The simulation results show that channel shortening combined with SD can drastically reduce computational complexity,and iterative SD performs better than DFE based on recursive least squares(RLS-DFE),DFE based on improved proportionate normalized least mean squares(IPNLMS-DFE),and channel estimation-based DFE(CE-DFE).Moreover,the sea experimental results at Zhairuoshan Island in Zhoushan show that the proposed receiver scheme has improved BER performance over RLSDFE,IPNLMS-DFE,and CE-DFE.Compared with the RLS-DFE,the BER,after five iterations,is reduced from 0.0076 to 0.0037 in the 8–12 k Hz band and from 0.1516 to 0.1145 in the 13–17 k Hz band at a distance of 2000 m.Thus,the proposed algorithm makes it possible to apply MLSE in UAC in practical scenarios.
文摘The"Decoding Zhonghua"International Conference on Dialogue among Civilisations,hosted by China International Public Relations Association,China Ethnic News and Academy of Contemporary China and World Studies was held in Beijing on January 17th.With the theme"Pursing Harmonious Coexistence of Civilisations through Dialogue".
文摘In this paper, both the high-complexity near-ML list decoding and the low-complexity belief propagation decoding are tested for some well-known regular and irregular LDPC codes. The complexity and performance trade-off is shown clearly and demonstrated with the paradigm of hybrid decoding. For regular LDPC code, the SNR-threshold performance and error-floor performance could be improved to the optimal level of ML decoding if the decoding complexity is progressively increased, usually corresponding to the near-ML decoding with progressively increased size of list. For irregular LDPC code, the SNR-threshold performance and error-floor performance could only be improved to a bottle-neck even with unlimited decoding complexity. However, with the technique of CRC-aided hybrid decoding, the ML performance could be greatly improved and approached with reasonable complexity thanks to the improved code-weight distribution from the concatenation of CRC and irregular LDPC code. Finally, CRC-aided 5GNR-LDPC code is evaluated and the capacity-approaching capability is shown.
文摘A global optimization algorithm (GOA) for parallel Chien search circuit in Reed-Solomon (RS) (255,239) decoder is presented. By finding out the common modulo 2 additions within groups of Galois field (GF) multipliers and pre-computing the common items, the GOA can reduce the number of XOR gates efficiently and thus reduce the circuit area. Different from other local optimization algorithms, the GOA is a global one. When there are more than one maximum matches at a time, the best match choice in the GOA has the least impact on the final result by only choosing the pair with the smallest relational value instead of choosing a pair randomly. The results show that the area of parallel Chien search circuits can be reduced by 51% compared to the direct implementation when the group-based GOA is used for GF multipliers and by 26% if applying the GOA to GF multipliers separately. This optimization scheme can be widely used in general parallel architecture in which many GF multipliers are involved.
文摘To improve the performance of the short interleaved serial concatenated convolutional code(SCCC) with low decoding iterative times, the structure of Log MAP algorithm is introduced into the conventional SOVA decoder to improve its performance at short interleaving delay. The combination of Log MAP and SOVA avoids updating the matrices of the maximum path, and also makes a contribution to the requirement of short delay. The simulation results of several SCCCs show that the improved decoder can obtain satisfied performance with short frame interleaver and it is suitable to the high bit rate low delay communication systems.
文摘The first domestic total dose hardened 2μm partially depleted silicon-on-insulator (PDSOI) CMOS 3-line to 8- line decoder fabricated in SIMOX is demonstrated. The radiation performance is characterized by transistor threshold voltage shifts,circuit static leakage currents,and I-V curves as a function of total dose up to 3× 10^5rad(Si). The worst case threshold voltage shifts of the front channels are less than 20mV for nMOS transistors at 3 × 10^5rad(Si) and follow-up irradiation and less than 70mV for the pMOS transistors. Furthermore, no significant radiation induced leakage currents and functional degeneration are observed.