Semantic Communication(SC)has emerged as a novel communication paradigm that provides a receiver with meaningful information extracted from the source to maximize information transmission throughput in wireless networ...Semantic Communication(SC)has emerged as a novel communication paradigm that provides a receiver with meaningful information extracted from the source to maximize information transmission throughput in wireless networks,beyond the theoretical capacity limit.Despite the extensive research on SC,there is a lack of comprehensive survey on technologies,solutions,applications,and challenges for SC.In this article,the development of SC is first reviewed and its characteristics,architecture,and advantages are summarized.Next,key technologies such as semantic extraction,semantic encoding,and semantic segmentation are discussed and their corresponding solutions in terms of efficiency,robustness,adaptability,and reliability are summarized.Applications of SC to UAV communication,remote image sensing and fusion,intelligent transportation,and healthcare are also presented and their strategies are summarized.Finally,some challenges and future research directions are presented to provide guidance for further research of SC.展开更多
With the rapid development of artificial intelligence and the widespread use of the Internet of Things, semantic communication, as an emerging communication paradigm, has been attracting great interest. Taking image t...With the rapid development of artificial intelligence and the widespread use of the Internet of Things, semantic communication, as an emerging communication paradigm, has been attracting great interest. Taking image transmission as an example, from the semantic communication's view, not all pixels in the images are equally important for certain receivers. The existing semantic communication systems directly perform semantic encoding and decoding on the whole image, in which the region of interest cannot be identified. In this paper, we propose a novel semantic communication system for image transmission that can distinguish between Regions Of Interest (ROI) and Regions Of Non-Interest (RONI) based on semantic segmentation, where a semantic segmentation algorithm is used to classify each pixel of the image and distinguish ROI and RONI. The system also enables high-quality transmission of ROI with lower communication overheads by transmissions through different semantic communication networks with different bandwidth requirements. An improved metric θPSNR is proposed to evaluate the transmission accuracy of the novel semantic transmission network. Experimental results show that our proposed system achieves a significant performance improvement compared with existing approaches, namely, existing semantic communication approaches and the conventional approach without semantics.展开更多
Degraded broadcast channels(DBC) are a typical multiuser communication scenario, Semantic communications over DBC still lack in-depth research. In this paper, we design a semantic communications approach based on mult...Degraded broadcast channels(DBC) are a typical multiuser communication scenario, Semantic communications over DBC still lack in-depth research. In this paper, we design a semantic communications approach based on multi-user semantic fusion for wireless image transmission over DBC. The transmitter extracts semantic features for two users separately and then effectively fuses them for broadcasting by leveraging semantic similarity. Unlike traditional allocation of time, power, or bandwidth, the semantic fusion scheme can dynamically control the weight of the semantic features of the two users to balance their performance. Considering the different channel state information(CSI) of both users over DBC,a DBC-Aware method is developed that embeds the CSI of both users into the joint source-channel coding encoder and fusion module to adapt to the channel.Experimental results show that the proposed system outperforms the traditional broadcasting schemes.展开更多
As conventional communication systems based on classic information theory have closely approached Shannon capacity,semantic communication is emerging as a key enabling technology for the further improvement of communi...As conventional communication systems based on classic information theory have closely approached Shannon capacity,semantic communication is emerging as a key enabling technology for the further improvement of communication performance.However,it is still unsettled on how to represent semantic information and characterise the theoretical limits of semantic-oriented compression and transmission.In this paper,we consider a semantic source which is characterised by a set of correlated random variables whose joint probabilistic distribution can be described by a Bayesian network.We give the information-theoretic limit on the lossless compression of the semantic source and introduce a low complexity encoding method by exploiting the conditional independence.We further characterise the limits on lossy compression of the semantic source and the upper and lower bounds of the rate-distortion function.We also investigate the lossy compression of the semantic source with two-sided information at the encoder and decoder,and obtain the corresponding rate distortion function.We prove that the optimal code of the semantic source is the combination of the optimal codes of each conditional independent set given the side information.展开更多
Increasing research has focused on semantic communication,the goal of which is to convey accurately the meaning instead of transmitting symbols from the sender to the receiver.In this paper,we design a novel encoding ...Increasing research has focused on semantic communication,the goal of which is to convey accurately the meaning instead of transmitting symbols from the sender to the receiver.In this paper,we design a novel encoding and decoding semantic communication framework,which adopts the semantic information and the contextual correlations between items to optimize the performance of a communication system over various channels.On the sender side,the average semantic loss caused by the wrong detection is defined,and a semantic source encoding strategy is developed to minimize the average semantic loss.To further improve communication reliability,a decoding strategy that utilizes the semantic and the context information to recover messages is proposed in the receiver.Extensive simulation results validate the superior performance of our strategies over state-of-the-art semantic coding and decoding policies on different communication channels.展开更多
In Chinese,the“好不X”(haobu X)structure expresses three types of meanings(negation,affirmation,and both affirmation-negation),where X exhibits differences in semantic symmetry.So far,no systematic explanatory theory...In Chinese,the“好不X”(haobu X)structure expresses three types of meanings(negation,affirmation,and both affirmation-negation),where X exhibits differences in semantic symmetry.So far,no systematic explanatory theory has been proposed to account for these differences.Therefore,this paper presents and argues for an explanatory hypothesis that progresses through four stages:“很不X”(henbu X)=>“好不X”(haobu X)(negation)=>[ironic use]“好不X”(haobu X)(affirmation)=>expansion and obstruction of affirmative“好不”(haobu).Specifically,(1)the basis of the negation“好不X”(haobu X)is attributed to“很不X”(henbu X),and the semantic asymmetry of X(excluding negative words)is explained using politeness principles,irony,and the semantic valence of negation results;(2)the ironic use of the negation“好不X”(haobu X)gives rise to the affirmation“好不X”(haobu X);(3)the grammaticalization and expansion of the positive meaning of“好不”(haobu)extend to X,which cannot appear in the negative meaning“好不__”structure(including words with opposite meanings and high polarity positive words).This explains the semantic symmetry of X in the positive meaning“好不X”(haobu X)structure;(4)when the affirmation“好不”(haobu)expands to the negation“好不X”(haobu X),it encounters both obstacles(X includes neutral and some positive words)and compatibility(X is some other positive words),thus explaining the semantic asymmetry of X in the affirmation-negation“好不X”(haobu X)(i.e.,X is positive words).展开更多
As a novel paradigm,semantic communication provides an effective solution for breaking through the future development dilemma of classical communication systems.However,it remains an unsolved problem of how to measure...As a novel paradigm,semantic communication provides an effective solution for breaking through the future development dilemma of classical communication systems.However,it remains an unsolved problem of how to measure the information transmission capability for a given semantic communication method and subsequently compare it with the classical communication method.In this paper,we first present a review of the semantic communication system,including its system model and the two typical coding and transmission methods for its implementations.To address the unsolved issue of the information transmission capability measure for semantic communication methods,we propose a new universal performance measure called Information Conductivity.We provide the definition and the physical significance to state its effectiveness in representing the information transmission capabilities of the semantic communication systems and present elaborations including its measure methods,degrees of freedom,and progressive analysis.Experimental results in image transmission scenarios validate its practical applicability.展开更多
To facilitate emerging applications and demands of edge intelligence(EI)-empowered 6G networks,model-driven semantic communications have been proposed to reduce transmission volume by deploying artificial intelligence...To facilitate emerging applications and demands of edge intelligence(EI)-empowered 6G networks,model-driven semantic communications have been proposed to reduce transmission volume by deploying artificial intelligence(AI)models that provide abilities of semantic extraction and recovery.Nevertheless,it is not feasible to preload all AI models on resource-constrained terminals.Thus,in-time model transmission becomes a crucial problem.This paper proposes an intellicise model transmission architecture to guarantee the reliable transmission of models for semantic communication.The mathematical relationship between model size and performance is formulated by employing a recognition error function supported with experimental data.We consider the characteristics of wireless channels and derive the closed-form expression of model transmission outage probability(MTOP)over the Rayleigh channel.Besides,we define the effective model accuracy(EMA)to evaluate the model transmission performance of both communication and intelligence.Then we propose a joint model selection and resource allocation(JMSRA)algorithm to maximize the average EMA of all users.Simulation results demonstrate that the average EMA of the JMSRA algorithm outperforms baseline algorithms by about 22%.展开更多
We consider an image semantic communication system in a time-varying fading Gaussian MIMO channel,with a finite number of channel states.A deep learning-aided broadcast approach scheme is proposed to benefit the adapt...We consider an image semantic communication system in a time-varying fading Gaussian MIMO channel,with a finite number of channel states.A deep learning-aided broadcast approach scheme is proposed to benefit the adaptive semantic transmission in terms of different channel states.We combine the classic broadcast approach with the image transformer to implement this adaptive joint source and channel coding(JSCC)scheme.Specifically,we utilize the neural network(NN)to jointly optimize the hierarchical image compression and superposition code mapping within this scheme.The learned transformers and codebooks allow recovering of the image with an adaptive quality and low error rate at the receiver side,in each channel state.The simulation results exhibit our proposed scheme can dynamically adapt the coding to the current channel state and outperform some existing intelligent schemes with the fixed coding block.展开更多
Video transmission requires considerable bandwidth,and current widely employed schemes prove inadequate when confronted with scenes featuring prominently.Motivated by the strides in talkinghead generative technology,t...Video transmission requires considerable bandwidth,and current widely employed schemes prove inadequate when confronted with scenes featuring prominently.Motivated by the strides in talkinghead generative technology,the paper introduces a semantic transmission system tailored for talking-head videos.The system captures semantic information from talking-head video and faithfully reconstructs source video at the receiver,only one-shot reference frame and compact semantic features are required for the entire transmission.Specifically,we analyze video semantics in the pixel domain frame-by-frame and jointly process multi-frame semantic information to seamlessly incorporate spatial and temporal information.Variational modeling is utilized to evaluate the diversity of importance among group semantics,thereby guiding bandwidth resource allocation for semantics to enhance system efficiency.The whole endto-end system is modeled as an optimization problem and equivalent to acquiring optimal rate-distortion performance.We evaluate our system on both reference frame and video transmission,experimental results demonstrate that our system can improve the efficiency and robustness of communications.Compared to the classical approaches,our system can save over 90%of bandwidth when user perception is close.展开更多
The concept of semantic communication provides a novel approach for applications in scenarios with limited communication resources.In this paper,we propose an end-to-end(E2E)semantic molecular communication system,aim...The concept of semantic communication provides a novel approach for applications in scenarios with limited communication resources.In this paper,we propose an end-to-end(E2E)semantic molecular communication system,aiming to enhance the efficiency of molecular communication systems by reducing the transmitted information.Specifically,following the joint source channel coding paradigm,the network is designed to encode the task-relevant information into the concentration of the information molecules,which is robust to the degradation of the molecular communication channel.Furthermore,we propose a channel network to enable the E2E learning over the non-differentiable molecular channel.Experimental results demonstrate the superior performance of the semantic molecular communication system over the conventional methods in classification tasks.展开更多
In the future development direction of the sixth generation(6G)mobile communication,several communication models are proposed to face the growing challenges of the task.The rapid development of artificial intelligence...In the future development direction of the sixth generation(6G)mobile communication,several communication models are proposed to face the growing challenges of the task.The rapid development of artificial intelligence(AI)foundation models provides significant support for efficient and intelligent communication interactions.In this paper,we propose an innovative semantic communication paradigm called task-oriented semantic communication system with foundation models.First,we segment the image by using task prompts based on the segment anything model(SAM)and contrastive language-image pretraining(CLIP).Meanwhile,we adopt Bezier curve to enhance the mask to improve the segmentation accuracy.Second,we have differentiated semantic compression and transmission approaches for segmented content.Third,we fuse different semantic information based on the conditional diffusion model to generate high-quality images that satisfy the users'specific task requirements.Finally,the experimental results show that the proposed system compresses the semantic information effectively and improves the robustness of semantic communication.展开更多
Currently,there is a growing trend among users to store their data in the cloud.However,the cloud is vulnerable to persistent data corruption risks arising from equipment failures and hacker attacks.Additionally,when ...Currently,there is a growing trend among users to store their data in the cloud.However,the cloud is vulnerable to persistent data corruption risks arising from equipment failures and hacker attacks.Additionally,when users perform file operations,the semantic integrity of the data can be compromised.Ensuring both data integrity and semantic correctness has become a critical issue that requires attention.We introduce a pioneering solution called Sec-Auditor,the first of its kind with the ability to verify data integrity and semantic correctness simultaneously,while maintaining a constant communication cost independent of the audited data volume.Sec-Auditor also supports public auditing,enabling anyone with access to public information to conduct data audits.This feature makes Sec-Auditor highly adaptable to open data environments,such as the cloud.In Sec-Auditor,users are assigned specific rules that are utilized to verify the accuracy of data semantic.Furthermore,users are given the flexibility to update their own rules as needed.We conduct in-depth analyses of the correctness and security of Sec-Auditor.We also compare several important security attributes with existing schemes,demonstrating the superior properties of Sec-Auditor.Evaluation results demonstrate that even for time-consuming file upload operations,our solution is more efficient than the comparison one.展开更多
In the video captioning methods based on an encoder-decoder,limited visual features are extracted by an encoder,and a natural sentence of the video content is generated using a decoder.However,this kind ofmethod is de...In the video captioning methods based on an encoder-decoder,limited visual features are extracted by an encoder,and a natural sentence of the video content is generated using a decoder.However,this kind ofmethod is dependent on a single video input source and few visual labels,and there is a problem with semantic alignment between video contents and generated natural sentences,which are not suitable for accurately comprehending and describing the video contents.To address this issue,this paper proposes a video captioning method by semantic topic-guided generation.First,a 3D convolutional neural network is utilized to extract the spatiotemporal features of videos during the encoding.Then,the semantic topics of video data are extracted using the visual labels retrieved from similar video data.In the decoding,a decoder is constructed by combining a novel Enhance-TopK sampling algorithm with a Generative Pre-trained Transformer-2 deep neural network,which decreases the influence of“deviation”in the semantic mapping process between videos and texts by jointly decoding a baseline and semantic topics of video contents.During this process,the designed Enhance-TopK sampling algorithm can alleviate a long-tail problem by dynamically adjusting the probability distribution of the predicted words.Finally,the experiments are conducted on two publicly used Microsoft Research Video Description andMicrosoft Research-Video to Text datasets.The experimental results demonstrate that the proposed method outperforms several state-of-art approaches.Specifically,the performance indicators Bilingual Evaluation Understudy,Metric for Evaluation of Translation with Explicit Ordering,Recall Oriented Understudy for Gisting Evaluation-longest common subsequence,and Consensus-based Image Description Evaluation of the proposed method are improved by 1.2%,0.1%,0.3%,and 2.4% on the Microsoft Research Video Description dataset,and 0.1%,1.0%,0.1%,and 2.8% on the Microsoft Research-Video to Text dataset,respectively,compared with the existing video captioning methods.As a result,the proposed method can generate video captioning that is more closely aligned with human natural language expression habits.展开更多
Semantic segmentation of driving scene images is crucial for autonomous driving.While deep learning technology has significantly improved daytime image semantic segmentation,nighttime images pose challenges due to fac...Semantic segmentation of driving scene images is crucial for autonomous driving.While deep learning technology has significantly improved daytime image semantic segmentation,nighttime images pose challenges due to factors like poor lighting and overexposure,making it difficult to recognize small objects.To address this,we propose an Image Adaptive Enhancement(IAEN)module comprising a parameter predictor(Edip),multiple image processing filters(Mdif),and a Detail Processing Module(DPM).Edip combines image processing filters to predict parameters like exposure and hue,optimizing image quality.We adopt a novel image encoder to enhance parameter prediction accuracy by enabling Edip to handle features at different scales.DPM strengthens overlooked image details,extending the IAEN module’s functionality.After the segmentation network,we integrate a Depth Guided Filter(DGF)to refine segmentation outputs.The entire network is trained end-to-end,with segmentation results guiding parameter prediction optimization,promoting self-learning and network improvement.This lightweight and efficient network architecture is particularly suitable for addressing challenges in nighttime image segmentation.Extensive experiments validate significant performance improvements of our approach on the ACDC-night and Nightcity datasets.展开更多
This paper focuses on the task of few-shot 3D point cloud semantic segmentation.Despite some progress,this task still encounters many issues due to the insufficient samples given,e.g.,incomplete object segmentation an...This paper focuses on the task of few-shot 3D point cloud semantic segmentation.Despite some progress,this task still encounters many issues due to the insufficient samples given,e.g.,incomplete object segmentation and inaccurate semantic discrimination.To tackle these issues,we first leverage part-whole relationships into the task of 3D point cloud semantic segmentation to capture semantic integrity,which is empowered by the dynamic capsule routing with the module of 3D Capsule Networks(CapsNets)in the embedding network.Concretely,the dynamic routing amalgamates geometric information of the 3D point cloud data to construct higher-level feature representations,which capture the relationships between object parts and their wholes.Secondly,we designed a multi-prototype enhancement module to enhance the prototype discriminability.Specifically,the single-prototype enhancement mechanism is expanded to the multi-prototype enhancement version for capturing rich semantics.Besides,the shot-correlation within the category is calculated via the interaction of different samples to enhance the intra-category similarity.Ablation studies prove that the involved part-whole relations and proposed multi-prototype enhancement module help to achieve complete object segmentation and improve semantic discrimination.Moreover,under the integration of these two modules,quantitative and qualitative experiments on two public benchmarks,including S3DIS and ScanNet,indicate the superior performance of the proposed framework on the task of 3D point cloud semantic segmentation,compared to some state-of-the-art methods.展开更多
High-resolution remote sensing image segmentation is a challenging task. In urban remote sensing, the presenceof occlusions and shadows often results in blurred or invisible object boundaries, thereby increasing the d...High-resolution remote sensing image segmentation is a challenging task. In urban remote sensing, the presenceof occlusions and shadows often results in blurred or invisible object boundaries, thereby increasing the difficultyof segmentation. In this paper, an improved network with a cross-region self-attention mechanism for multi-scalefeatures based onDeepLabv3+is designed to address the difficulties of small object segmentation and blurred targetedge segmentation. First,we use CrossFormer as the backbone feature extraction network to achieve the interactionbetween large- and small-scale features, and establish self-attention associations between features at both large andsmall scales to capture global contextual feature information. Next, an improved atrous spatial pyramid poolingmodule is introduced to establish multi-scale feature maps with large- and small-scale feature associations, andattention vectors are added in the channel direction to enable adaptive adjustment of multi-scale channel features.The proposed networkmodel is validated using the PotsdamandVaihingen datasets. The experimental results showthat, compared with existing techniques, the network model designed in this paper can extract and fuse multiscaleinformation, more clearly extract edge information and small-scale information, and segment boundariesmore smoothly. Experimental results on public datasets demonstrate the superiority of ourmethod compared withseveral state-of-the-art networks.展开更多
With the rapid growth of information transmission via the Internet,efforts have been made to reduce network load to promote efficiency.One such application is semantic computing,which can extract and process semantic ...With the rapid growth of information transmission via the Internet,efforts have been made to reduce network load to promote efficiency.One such application is semantic computing,which can extract and process semantic communication.Social media has enabled users to share their current emotions,opinions,and life events through their mobile devices.Notably,people suffering from mental health problems are more willing to share their feelings on social networks.Therefore,it is necessary to extract semantic information from social media(vlog data)to identify abnormal emotional states to facilitate early identification and intervention.Most studies do not consider spatio-temporal information when fusing multimodal information to identify abnormal emotional states such as depression.To solve this problem,this paper proposes a spatio-temporal squeeze transformer method for the extraction of semantic features of depression.First,a module with spatio-temporal data is embedded into the transformer encoder,which is utilized to obtain a representation of spatio-temporal features.Second,a classifier with a voting mechanism is designed to encourage the model to classify depression and non-depression effec-tively.Experiments are conducted on the D-Vlog dataset.The results show that the method is effective,and the accuracy rate can reach 70.70%.This work provides scaffolding for future work in the detection of affect recognition in semantic communication based on social media vlog data.展开更多
In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple e...In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple extraction models facemultiple challenges when processing domain-specific data,including insufficient utilization of semantic interaction information between entities and relations,difficulties in handling challenging samples,and the scarcity of domain-specific datasets.To address these issues,our study introduces three innovative components:Relation semantic enhancement,data augmentation,and a voting strategy,all designed to significantly improve the model’s performance in tackling domain-specific relational triple extraction tasks.We first propose an innovative attention interaction module.This method significantly enhances the semantic interaction capabilities between entities and relations by integrating semantic information fromrelation labels.Second,we propose a voting strategy that effectively combines the strengths of large languagemodels(LLMs)and fine-tuned small pre-trained language models(SLMs)to reevaluate challenging samples,thereby improving the model’s adaptability in specific domains.Additionally,we explore the use of LLMs for data augmentation,aiming to generate domain-specific datasets to alleviate the scarcity of domain data.Experiments conducted on three domain-specific datasets demonstrate that our model outperforms existing comparative models in several aspects,with F1 scores exceeding the State of the Art models by 2%,1.6%,and 0.6%,respectively,validating the effectiveness and generalizability of our approach.展开更多
In recent years,semantic segmentation on 3D point cloud data has attracted much attention.Unlike 2D images where pixels distribute regularly in the image domain,3D point clouds in non-Euclidean space are irregular and...In recent years,semantic segmentation on 3D point cloud data has attracted much attention.Unlike 2D images where pixels distribute regularly in the image domain,3D point clouds in non-Euclidean space are irregular and inherently sparse.Therefore,it is very difficult to extract long-range contexts and effectively aggregate local features for semantic segmentation in 3D point cloud space.Most current methods either focus on local feature aggregation or long-range context dependency,but fail to directly establish a global-local feature extractor to complete the point cloud semantic segmentation tasks.In this paper,we propose a Transformer-based stratified graph convolutional network(SGT-Net),which enlarges the effective receptive field and builds direct long-range dependency.Specifically,we first propose a novel dense-sparse sampling strategy that provides dense local vertices and sparse long-distance vertices for subsequent graph convolutional network(GCN).Secondly,we propose a multi-key self-attention mechanism based on the Transformer to further weight augmentation for crucial neighboring relationships and enlarge the effective receptive field.In addition,to further improve the efficiency of the network,we propose a similarity measurement module to determine whether the neighborhood near the center point is effective.We demonstrate the validity and superiority of our method on the S3DIS and ShapeNet datasets.Through ablation experiments and segmentation visualization,we verify that the SGT model can improve the performance of the point cloud semantic segmentation.展开更多
基金supported by the Natural Science Foundation of China under Grants 61971084,62025105,62001073,62272075the National Natural Science Foundation of Chongqing under Grants cstc2021ycjh-bgzxm0039,cstc2021jcyj-msxmX0031+1 种基金the Science and Technology Research Program for Chongqing Municipal Education Commission KJZD-M202200601the Support Program for Overseas Students to Return to China for Entrepreneurship and Innovation under Grants cx2021003,cx2021053.
文摘Semantic Communication(SC)has emerged as a novel communication paradigm that provides a receiver with meaningful information extracted from the source to maximize information transmission throughput in wireless networks,beyond the theoretical capacity limit.Despite the extensive research on SC,there is a lack of comprehensive survey on technologies,solutions,applications,and challenges for SC.In this article,the development of SC is first reviewed and its characteristics,architecture,and advantages are summarized.Next,key technologies such as semantic extraction,semantic encoding,and semantic segmentation are discussed and their corresponding solutions in terms of efficiency,robustness,adaptability,and reliability are summarized.Applications of SC to UAV communication,remote image sensing and fusion,intelligent transportation,and healthcare are also presented and their strategies are summarized.Finally,some challenges and future research directions are presented to provide guidance for further research of SC.
基金supported in part by collaborative research with Toyota Motor Corporation,in part by ROIS NII Open Collaborative Research under Grant 21S0601,in part by JSPS KAKENHI under Grants 20H00592,21H03424.
文摘With the rapid development of artificial intelligence and the widespread use of the Internet of Things, semantic communication, as an emerging communication paradigm, has been attracting great interest. Taking image transmission as an example, from the semantic communication's view, not all pixels in the images are equally important for certain receivers. The existing semantic communication systems directly perform semantic encoding and decoding on the whole image, in which the region of interest cannot be identified. In this paper, we propose a novel semantic communication system for image transmission that can distinguish between Regions Of Interest (ROI) and Regions Of Non-Interest (RONI) based on semantic segmentation, where a semantic segmentation algorithm is used to classify each pixel of the image and distinguish ROI and RONI. The system also enables high-quality transmission of ROI with lower communication overheads by transmissions through different semantic communication networks with different bandwidth requirements. An improved metric θPSNR is proposed to evaluate the transmission accuracy of the novel semantic transmission network. Experimental results show that our proposed system achieves a significant performance improvement compared with existing approaches, namely, existing semantic communication approaches and the conventional approach without semantics.
基金supported in part by National Key R&D Project of China (2023YFB2906201)the National Natural Science Foundation of China (62222111, 62125108 and 62431015)the Fundamental Research Funds for the Central Universities。
文摘Degraded broadcast channels(DBC) are a typical multiuser communication scenario, Semantic communications over DBC still lack in-depth research. In this paper, we design a semantic communications approach based on multi-user semantic fusion for wireless image transmission over DBC. The transmitter extracts semantic features for two users separately and then effectively fuses them for broadcasting by leveraging semantic similarity. Unlike traditional allocation of time, power, or bandwidth, the semantic fusion scheme can dynamically control the weight of the semantic features of the two users to balance their performance. Considering the different channel state information(CSI) of both users over DBC,a DBC-Aware method is developed that embeds the CSI of both users into the joint source-channel coding encoder and fusion module to adapt to the channel.Experimental results show that the proposed system outperforms the traditional broadcasting schemes.
基金partly supported by NSFC under grant No.62293481,No.62201505partly by the SUTDZJU IDEA Grant(SUTD-ZJU(VP)202102)。
文摘As conventional communication systems based on classic information theory have closely approached Shannon capacity,semantic communication is emerging as a key enabling technology for the further improvement of communication performance.However,it is still unsettled on how to represent semantic information and characterise the theoretical limits of semantic-oriented compression and transmission.In this paper,we consider a semantic source which is characterised by a set of correlated random variables whose joint probabilistic distribution can be described by a Bayesian network.We give the information-theoretic limit on the lossless compression of the semantic source and introduce a low complexity encoding method by exploiting the conditional independence.We further characterise the limits on lossy compression of the semantic source and the upper and lower bounds of the rate-distortion function.We also investigate the lossy compression of the semantic source with two-sided information at the encoder and decoder,and obtain the corresponding rate distortion function.We prove that the optimal code of the semantic source is the combination of the optimal codes of each conditional independent set given the side information.
基金supported in part by the National Natural Science Foundation of China under Grant No.61931020,U19B2024,62171449,62001483in part by the science and technology innovation Program of Hunan Province under Grant No.2021JJ40690。
文摘Increasing research has focused on semantic communication,the goal of which is to convey accurately the meaning instead of transmitting symbols from the sender to the receiver.In this paper,we design a novel encoding and decoding semantic communication framework,which adopts the semantic information and the contextual correlations between items to optimize the performance of a communication system over various channels.On the sender side,the average semantic loss caused by the wrong detection is defined,and a semantic source encoding strategy is developed to minimize the average semantic loss.To further improve communication reliability,a decoding strategy that utilizes the semantic and the context information to recover messages is proposed in the receiver.Extensive simulation results validate the superior performance of our strategies over state-of-the-art semantic coding and decoding policies on different communication channels.
基金2024 Key Scientific Research Project of Ordinary Colleges and Universities in Anhui Province“Research on the External Communication and Translation Modes of Hui Culture under One Belt,One Road Initiative”(2024AH053129)2022 General Project of Teaching Research on Quality Engineering in Anhui Province“Research on the Dilemmas and Countermeasures of Chinese Language Teaching for Foreign Students in Vocational Colleges under the Background of One Belt,One Road Initiative”(2022jyxm270)。
文摘In Chinese,the“好不X”(haobu X)structure expresses three types of meanings(negation,affirmation,and both affirmation-negation),where X exhibits differences in semantic symmetry.So far,no systematic explanatory theory has been proposed to account for these differences.Therefore,this paper presents and argues for an explanatory hypothesis that progresses through four stages:“很不X”(henbu X)=>“好不X”(haobu X)(negation)=>[ironic use]“好不X”(haobu X)(affirmation)=>expansion and obstruction of affirmative“好不”(haobu).Specifically,(1)the basis of the negation“好不X”(haobu X)is attributed to“很不X”(henbu X),and the semantic asymmetry of X(excluding negative words)is explained using politeness principles,irony,and the semantic valence of negation results;(2)the ironic use of the negation“好不X”(haobu X)gives rise to the affirmation“好不X”(haobu X);(3)the grammaticalization and expansion of the positive meaning of“好不”(haobu)extend to X,which cannot appear in the negative meaning“好不__”structure(including words with opposite meanings and high polarity positive words).This explains the semantic symmetry of X in the positive meaning“好不X”(haobu X)structure;(4)when the affirmation“好不”(haobu)expands to the negation“好不X”(haobu X),it encounters both obstacles(X includes neutral and some positive words)and compatibility(X is some other positive words),thus explaining the semantic asymmetry of X in the affirmation-negation“好不X”(haobu X)(i.e.,X is positive words).
基金supported by the National Natural Science Foundation of China(No.62293481,No.62071058)。
文摘As a novel paradigm,semantic communication provides an effective solution for breaking through the future development dilemma of classical communication systems.However,it remains an unsolved problem of how to measure the information transmission capability for a given semantic communication method and subsequently compare it with the classical communication method.In this paper,we first present a review of the semantic communication system,including its system model and the two typical coding and transmission methods for its implementations.To address the unsolved issue of the information transmission capability measure for semantic communication methods,we propose a new universal performance measure called Information Conductivity.We provide the definition and the physical significance to state its effectiveness in representing the information transmission capabilities of the semantic communication systems and present elaborations including its measure methods,degrees of freedom,and progressive analysis.Experimental results in image transmission scenarios validate its practical applicability.
基金supported in part by the National Key R&D Program of China No.2020YFB1806905the National Natural Science Foundation of China No.62201079+1 种基金the Beijing Natural Science Foundation No.L232051the Major Key Project of Peng Cheng Laboratory(PCL)Department of Broadband Communication。
文摘To facilitate emerging applications and demands of edge intelligence(EI)-empowered 6G networks,model-driven semantic communications have been proposed to reduce transmission volume by deploying artificial intelligence(AI)models that provide abilities of semantic extraction and recovery.Nevertheless,it is not feasible to preload all AI models on resource-constrained terminals.Thus,in-time model transmission becomes a crucial problem.This paper proposes an intellicise model transmission architecture to guarantee the reliable transmission of models for semantic communication.The mathematical relationship between model size and performance is formulated by employing a recognition error function supported with experimental data.We consider the characteristics of wireless channels and derive the closed-form expression of model transmission outage probability(MTOP)over the Rayleigh channel.Besides,we define the effective model accuracy(EMA)to evaluate the model transmission performance of both communication and intelligence.Then we propose a joint model selection and resource allocation(JMSRA)algorithm to maximize the average EMA of all users.Simulation results demonstrate that the average EMA of the JMSRA algorithm outperforms baseline algorithms by about 22%.
基金supported in part by the National Key R&D Project of China under Grant 2020YFA0712300National Natural Science Foundation of China under Grant NSFC-62231022,12031011supported in part by the NSF of China under Grant 62125108。
文摘We consider an image semantic communication system in a time-varying fading Gaussian MIMO channel,with a finite number of channel states.A deep learning-aided broadcast approach scheme is proposed to benefit the adaptive semantic transmission in terms of different channel states.We combine the classic broadcast approach with the image transformer to implement this adaptive joint source and channel coding(JSCC)scheme.Specifically,we utilize the neural network(NN)to jointly optimize the hierarchical image compression and superposition code mapping within this scheme.The learned transformers and codebooks allow recovering of the image with an adaptive quality and low error rate at the receiver side,in each channel state.The simulation results exhibit our proposed scheme can dynamically adapt the coding to the current channel state and outperform some existing intelligent schemes with the fixed coding block.
基金supported by the National Natural Science Foundation of China(No.61971062)BUPT Excellent Ph.D.Students Foundation(CX2022153)。
文摘Video transmission requires considerable bandwidth,and current widely employed schemes prove inadequate when confronted with scenes featuring prominently.Motivated by the strides in talkinghead generative technology,the paper introduces a semantic transmission system tailored for talking-head videos.The system captures semantic information from talking-head video and faithfully reconstructs source video at the receiver,only one-shot reference frame and compact semantic features are required for the entire transmission.Specifically,we analyze video semantics in the pixel domain frame-by-frame and jointly process multi-frame semantic information to seamlessly incorporate spatial and temporal information.Variational modeling is utilized to evaluate the diversity of importance among group semantics,thereby guiding bandwidth resource allocation for semantics to enhance system efficiency.The whole endto-end system is modeled as an optimization problem and equivalent to acquiring optimal rate-distortion performance.We evaluate our system on both reference frame and video transmission,experimental results demonstrate that our system can improve the efficiency and robustness of communications.Compared to the classical approaches,our system can save over 90%of bandwidth when user perception is close.
基金supported by the Beijing Natural Science Foundation(L211012)the Natural Science Foundation of China(62122012,62221001)the Fundamental Research Funds for the Central Universities(2022JBQY004)。
文摘The concept of semantic communication provides a novel approach for applications in scenarios with limited communication resources.In this paper,we propose an end-to-end(E2E)semantic molecular communication system,aiming to enhance the efficiency of molecular communication systems by reducing the transmitted information.Specifically,following the joint source channel coding paradigm,the network is designed to encode the task-relevant information into the concentration of the information molecules,which is robust to the degradation of the molecular communication channel.Furthermore,we propose a channel network to enable the E2E learning over the non-differentiable molecular channel.Experimental results demonstrate the superior performance of the semantic molecular communication system over the conventional methods in classification tasks.
基金supported in part by the National Natural Science Foundation of China under Grant(62001246,62231017,62201277,62071255)the Natural Science Foundation of Jiangsu Province under Grant BK20220390+3 种基金Key R and D Program of Jiangsu Province Key project and topics under Grant(BE2021095,BE2023035)the Natural Science Research Startup Foundation of Recruiting Talents of Nanjing University of Posts and Telecommunications(Grant No.NY221011)National Science Foundation of Xiamen,China(No.3502Z202372013)Open Project of the Key Laboratory of Underwater Acoustic Communication and Marine Information Technology(Xiamen University)of the Ministry of Education,China(No.UAC202304)。
文摘In the future development direction of the sixth generation(6G)mobile communication,several communication models are proposed to face the growing challenges of the task.The rapid development of artificial intelligence(AI)foundation models provides significant support for efficient and intelligent communication interactions.In this paper,we propose an innovative semantic communication paradigm called task-oriented semantic communication system with foundation models.First,we segment the image by using task prompts based on the segment anything model(SAM)and contrastive language-image pretraining(CLIP).Meanwhile,we adopt Bezier curve to enhance the mask to improve the segmentation accuracy.Second,we have differentiated semantic compression and transmission approaches for segmented content.Third,we fuse different semantic information based on the conditional diffusion model to generate high-quality images that satisfy the users'specific task requirements.Finally,the experimental results show that the proposed system compresses the semantic information effectively and improves the robustness of semantic communication.
基金This research was supported by the Qinghai Provincial High-End Innovative and Entrepreneurial Talents Project.
文摘Currently,there is a growing trend among users to store their data in the cloud.However,the cloud is vulnerable to persistent data corruption risks arising from equipment failures and hacker attacks.Additionally,when users perform file operations,the semantic integrity of the data can be compromised.Ensuring both data integrity and semantic correctness has become a critical issue that requires attention.We introduce a pioneering solution called Sec-Auditor,the first of its kind with the ability to verify data integrity and semantic correctness simultaneously,while maintaining a constant communication cost independent of the audited data volume.Sec-Auditor also supports public auditing,enabling anyone with access to public information to conduct data audits.This feature makes Sec-Auditor highly adaptable to open data environments,such as the cloud.In Sec-Auditor,users are assigned specific rules that are utilized to verify the accuracy of data semantic.Furthermore,users are given the flexibility to update their own rules as needed.We conduct in-depth analyses of the correctness and security of Sec-Auditor.We also compare several important security attributes with existing schemes,demonstrating the superior properties of Sec-Auditor.Evaluation results demonstrate that even for time-consuming file upload operations,our solution is more efficient than the comparison one.
基金supported in part by the National Natural Science Foundation of China under Grant 61873277in part by the Natural Science Basic Research Plan in Shaanxi Province of China underGrant 2020JQ-758in part by the Chinese Postdoctoral Science Foundation under Grant 2020M673446.
文摘In the video captioning methods based on an encoder-decoder,limited visual features are extracted by an encoder,and a natural sentence of the video content is generated using a decoder.However,this kind ofmethod is dependent on a single video input source and few visual labels,and there is a problem with semantic alignment between video contents and generated natural sentences,which are not suitable for accurately comprehending and describing the video contents.To address this issue,this paper proposes a video captioning method by semantic topic-guided generation.First,a 3D convolutional neural network is utilized to extract the spatiotemporal features of videos during the encoding.Then,the semantic topics of video data are extracted using the visual labels retrieved from similar video data.In the decoding,a decoder is constructed by combining a novel Enhance-TopK sampling algorithm with a Generative Pre-trained Transformer-2 deep neural network,which decreases the influence of“deviation”in the semantic mapping process between videos and texts by jointly decoding a baseline and semantic topics of video contents.During this process,the designed Enhance-TopK sampling algorithm can alleviate a long-tail problem by dynamically adjusting the probability distribution of the predicted words.Finally,the experiments are conducted on two publicly used Microsoft Research Video Description andMicrosoft Research-Video to Text datasets.The experimental results demonstrate that the proposed method outperforms several state-of-art approaches.Specifically,the performance indicators Bilingual Evaluation Understudy,Metric for Evaluation of Translation with Explicit Ordering,Recall Oriented Understudy for Gisting Evaluation-longest common subsequence,and Consensus-based Image Description Evaluation of the proposed method are improved by 1.2%,0.1%,0.3%,and 2.4% on the Microsoft Research Video Description dataset,and 0.1%,1.0%,0.1%,and 2.8% on the Microsoft Research-Video to Text dataset,respectively,compared with the existing video captioning methods.As a result,the proposed method can generate video captioning that is more closely aligned with human natural language expression habits.
基金This work is supported in part by The National Natural Science Foundation of China(Grant Number 61971078),which provided domain expertise and computational power that greatly assisted the activityThis work was financially supported by Chongqing Municipal Education Commission Grants for-Major Science and Technology Project(Grant Number gzlcx20243175).
文摘Semantic segmentation of driving scene images is crucial for autonomous driving.While deep learning technology has significantly improved daytime image semantic segmentation,nighttime images pose challenges due to factors like poor lighting and overexposure,making it difficult to recognize small objects.To address this,we propose an Image Adaptive Enhancement(IAEN)module comprising a parameter predictor(Edip),multiple image processing filters(Mdif),and a Detail Processing Module(DPM).Edip combines image processing filters to predict parameters like exposure and hue,optimizing image quality.We adopt a novel image encoder to enhance parameter prediction accuracy by enabling Edip to handle features at different scales.DPM strengthens overlooked image details,extending the IAEN module’s functionality.After the segmentation network,we integrate a Depth Guided Filter(DGF)to refine segmentation outputs.The entire network is trained end-to-end,with segmentation results guiding parameter prediction optimization,promoting self-learning and network improvement.This lightweight and efficient network architecture is particularly suitable for addressing challenges in nighttime image segmentation.Extensive experiments validate significant performance improvements of our approach on the ACDC-night and Nightcity datasets.
基金This work is supported by the National Natural Science Foundation of China under Grant No.62001341the National Natural Science Foundation of Jiangsu Province under Grant No.BK20221379the Jiangsu Engineering Research Center of Digital Twinning Technology for Key Equipment in Petrochemical Process under Grant No.DTEC202104.
文摘This paper focuses on the task of few-shot 3D point cloud semantic segmentation.Despite some progress,this task still encounters many issues due to the insufficient samples given,e.g.,incomplete object segmentation and inaccurate semantic discrimination.To tackle these issues,we first leverage part-whole relationships into the task of 3D point cloud semantic segmentation to capture semantic integrity,which is empowered by the dynamic capsule routing with the module of 3D Capsule Networks(CapsNets)in the embedding network.Concretely,the dynamic routing amalgamates geometric information of the 3D point cloud data to construct higher-level feature representations,which capture the relationships between object parts and their wholes.Secondly,we designed a multi-prototype enhancement module to enhance the prototype discriminability.Specifically,the single-prototype enhancement mechanism is expanded to the multi-prototype enhancement version for capturing rich semantics.Besides,the shot-correlation within the category is calculated via the interaction of different samples to enhance the intra-category similarity.Ablation studies prove that the involved part-whole relations and proposed multi-prototype enhancement module help to achieve complete object segmentation and improve semantic discrimination.Moreover,under the integration of these two modules,quantitative and qualitative experiments on two public benchmarks,including S3DIS and ScanNet,indicate the superior performance of the proposed framework on the task of 3D point cloud semantic segmentation,compared to some state-of-the-art methods.
基金the National Natural Science Foundation of China(Grant Number 62066013)Hainan Provincial Natural Science Foundation of China(Grant Numbers 622RC674 and 2019RC182).
文摘High-resolution remote sensing image segmentation is a challenging task. In urban remote sensing, the presenceof occlusions and shadows often results in blurred or invisible object boundaries, thereby increasing the difficultyof segmentation. In this paper, an improved network with a cross-region self-attention mechanism for multi-scalefeatures based onDeepLabv3+is designed to address the difficulties of small object segmentation and blurred targetedge segmentation. First,we use CrossFormer as the backbone feature extraction network to achieve the interactionbetween large- and small-scale features, and establish self-attention associations between features at both large andsmall scales to capture global contextual feature information. Next, an improved atrous spatial pyramid poolingmodule is introduced to establish multi-scale feature maps with large- and small-scale feature associations, andattention vectors are added in the channel direction to enable adaptive adjustment of multi-scale channel features.The proposed networkmodel is validated using the PotsdamandVaihingen datasets. The experimental results showthat, compared with existing techniques, the network model designed in this paper can extract and fuse multiscaleinformation, more clearly extract edge information and small-scale information, and segment boundariesmore smoothly. Experimental results on public datasets demonstrate the superiority of ourmethod compared withseveral state-of-the-art networks.
基金supported in part by the STI 2030-Major Projects(2021ZD0202002)in part by the National Natural Science Foundation of China(Grant No.62227807)+2 种基金in part by the Natural Science Foundation of Gansu Province,China(Grant No.22JR5RA488)in part by the Fundamental Research Funds for the Central Universities(Grant No.lzujbky-2023-16)Supported by Supercomputing Center of Lanzhou University.
文摘With the rapid growth of information transmission via the Internet,efforts have been made to reduce network load to promote efficiency.One such application is semantic computing,which can extract and process semantic communication.Social media has enabled users to share their current emotions,opinions,and life events through their mobile devices.Notably,people suffering from mental health problems are more willing to share their feelings on social networks.Therefore,it is necessary to extract semantic information from social media(vlog data)to identify abnormal emotional states to facilitate early identification and intervention.Most studies do not consider spatio-temporal information when fusing multimodal information to identify abnormal emotional states such as depression.To solve this problem,this paper proposes a spatio-temporal squeeze transformer method for the extraction of semantic features of depression.First,a module with spatio-temporal data is embedded into the transformer encoder,which is utilized to obtain a representation of spatio-temporal features.Second,a classifier with a voting mechanism is designed to encourage the model to classify depression and non-depression effec-tively.Experiments are conducted on the D-Vlog dataset.The results show that the method is effective,and the accuracy rate can reach 70.70%.This work provides scaffolding for future work in the detection of affect recognition in semantic communication based on social media vlog data.
基金Science and Technology Innovation 2030-Major Project of“New Generation Artificial Intelligence”granted by Ministry of Science and Technology,Grant Number 2020AAA0109300.
文摘In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple extraction models facemultiple challenges when processing domain-specific data,including insufficient utilization of semantic interaction information between entities and relations,difficulties in handling challenging samples,and the scarcity of domain-specific datasets.To address these issues,our study introduces three innovative components:Relation semantic enhancement,data augmentation,and a voting strategy,all designed to significantly improve the model’s performance in tackling domain-specific relational triple extraction tasks.We first propose an innovative attention interaction module.This method significantly enhances the semantic interaction capabilities between entities and relations by integrating semantic information fromrelation labels.Second,we propose a voting strategy that effectively combines the strengths of large languagemodels(LLMs)and fine-tuned small pre-trained language models(SLMs)to reevaluate challenging samples,thereby improving the model’s adaptability in specific domains.Additionally,we explore the use of LLMs for data augmentation,aiming to generate domain-specific datasets to alleviate the scarcity of domain data.Experiments conducted on three domain-specific datasets demonstrate that our model outperforms existing comparative models in several aspects,with F1 scores exceeding the State of the Art models by 2%,1.6%,and 0.6%,respectively,validating the effectiveness and generalizability of our approach.
基金supported in part by the National Natural Science Foundation of China under Grant Nos.U20A20197,62306187the Foundation of Ministry of Industry and Information Technology TC220H05X-04.
文摘In recent years,semantic segmentation on 3D point cloud data has attracted much attention.Unlike 2D images where pixels distribute regularly in the image domain,3D point clouds in non-Euclidean space are irregular and inherently sparse.Therefore,it is very difficult to extract long-range contexts and effectively aggregate local features for semantic segmentation in 3D point cloud space.Most current methods either focus on local feature aggregation or long-range context dependency,but fail to directly establish a global-local feature extractor to complete the point cloud semantic segmentation tasks.In this paper,we propose a Transformer-based stratified graph convolutional network(SGT-Net),which enlarges the effective receptive field and builds direct long-range dependency.Specifically,we first propose a novel dense-sparse sampling strategy that provides dense local vertices and sparse long-distance vertices for subsequent graph convolutional network(GCN).Secondly,we propose a multi-key self-attention mechanism based on the Transformer to further weight augmentation for crucial neighboring relationships and enlarge the effective receptive field.In addition,to further improve the efficiency of the network,we propose a similarity measurement module to determine whether the neighborhood near the center point is effective.We demonstrate the validity and superiority of our method on the S3DIS and ShapeNet datasets.Through ablation experiments and segmentation visualization,we verify that the SGT model can improve the performance of the point cloud semantic segmentation.