摘要
自从1948年经典信息论诞生以来,在其指导下,现代通信技术已经逼近了理论性能极限,例如信息熵H(U)、信道容量C=max_(p(x))I(X;Y)以及率失真函数R(D)=min_(p(x|x):Ed(x,x)≤D)I(X;X)。长期以来,由于经典信息论只研究语法信息,限制了通信科学的进一步发展。近年来,研究语义信息处理与传输的通信技术获得了学术界的普遍关注,语义通信开辟了未来通信技术发展的新方向,但还缺乏一般性的数学指导理论。为了解决这一难题,构建了语义信息论的理论框架,对语义信息的度量体系与语义通信的理论极限进行了系统性阐述。首先,通过深入分析各类信源的数据特征,以及各种下游任务的需求,总结归纳出语义信息的普遍属性——同义性。由此指出语义信息是语法信息的上级概念,是许多等效或相似语法信息的抽象特征,表征隐藏在数据或消息背后的含义或内容。将语义信息与语法信息之间的关系命名为同义映射,这是一种“一对多”映射,即一个语义符号可以由许多不同的语法符号表示。基于同义映射f这一核心概念,引入语义熵H_(s)(U)作为语义信息的基本度量指标,表示为信源概率分布与同义映射的泛函。在此基础上,引入上/下语义互信息I^(s)(X;Y)(I_(s)(X;Y)),语义信道容量C_(s)=max_(f_(xy))max_(p_((x)))I^(s)(X;Y)以及语义率失真函数R_(s)(D)=min_({f_(x),f_(x)})min_(p(x|x):Ed_(s)(x,x)≤D)I_(s)(X;X),从而构建了完整的语义信息度量体系。这些语义信息度量是经典信息度量的自然延伸,都由同义映射约束,如果采用“一对一”映射,则可以退化为传统的信息度量。由此可见,语义信息度量体系包含语法信息度量,前者与后者具有兼容性。其次,证明了3个重要的语义编码定理,以揭示语义通信的性能优势。基于同义映射,引入新的数学工具——语义渐近均分(AEP),详细探讨了同义典型序列的数学性质,并应用随机编码和同义典型序列译码/编码,证明了语义无失真信源编码定理、语义信道编码定理和语义限失真信源编码定理。类似于经典信息论,这些基本编码定理也都是存在性定理,但它们指出了语义通信系统的性能极限,在语义信息论中起着关键作用。由同义映射和这些基本编码定理可以推断,语义通信系统的性能优于经典通信系统,即语义熵小于信息熵H_(s)(U)≤H(U),语义信道容量大于经典信道容量C_(s)≥C,以及语义率失真函数小于经典率失真函数R_(s)(D)≤R(D)。最后,讨论了连续条件下的语义信息度量。此时,同义映射转换为连续随机变量分布区间的划分方式。相应地,划分后的子区间被命名为同义区间,其平均长度定义为同义长度S。特别是对于限带高斯信道,得到了一个新的信道容量公式C_(s)=B log[S^(4)(1+P/N_(0)B)],其中,平均同义长度S表征了信息的辨识能力。这一容量公式是经典信道容量的重要扩展,当S=1时,该公式退化为著名的香农信道容量公式。综上所述,语义信息论依据同义映射这一语义信息的本质特征,构建了语义信息的度量体系,引入新的数学工具,证明了语义编码的基本定理,论证了语义通信系统的性能极限,揭示了未来语义通信的巨大性能潜力。
The year 1948 witnessed the historic moment of the birth of classic information theory(CIT).Guided byr CIT,modern communication techniques have approached the theoretic limitations,such as,entropy function H(U),channel capacity C=max_(p(x))I(X;Y)and rate-distortion function R(D)=min_(p(x|x):Ed(x,x)≤D)I(X;X).Semantic communication paves a new direction for future communication techniques whereas the guided theory is missed.In this paper,we try to establish a systematic framework of semantic information theory(SIT).We investigate the behavior of semantic communication and find that synonym is the basic feature so we define the synonymous mapping between semantic information and syntactic information.Stemming from this core concept,synonymous mapping f,we introduce the measures of semantic information,such as semantic entropy H_(s)(U),up/down semantic mutual information I^(s)(X;Y)(I_(s)(X;Y)),semantic channel capacity C_(s)=max_(f_(xy))max_(p_((x)))I^(s)(X;Y),and semantic rate-distortion function R_(s)(D)=min_({f_(x),f_(x)})min_(p(x|x):Ed_(s)(x,x)≤D)I_(s)(X;X).Furthermore,we prove three coding theorems of SIT by using random coding and(jointly)typical decoding/encoding,that is,the semantic source coding theorem,semantic channel coding theorem,and semantic rate-distortion coding theorem.We find that the limits of SIT are extended by using synonymous mapping,that is,H_(s)(U)≤H(U)C_(s)≥C and R_(s)(D)≤R(D).All these works composite the basis of semantic information theory.In addition,we discuss the semantic information measures in the continuous case.Especially,for the band-limited Gaussian channel,we obtain a new channel capacity formula,C_(s)=B log[S^(4)(1+P/N_(0)B)],where the average synonymous length S indicates the identification ability of information.In summary,the theoretic framework of SIT proposed in this paper is a natural extension of CIT and may reveal great performance potential for future communication.
作者
牛凯
张平
Kai Niu;Ping Zhang(School of Artificial Intelligence,BUPT;School of Information and Communication Engineering at Beijing University of Posts and Telecommunications)
出处
《通信学报》
EI
CSCD
北大核心
2024年第6期7-59,共53页
Journal on Communications
基金
国家自然科学基金资助项目(No.62293481,No.62071058)。
关键词
同义映射
语义熵
上/下语义互信息
语义信道容量
语义失真
语义率失真函数
语义典型序列
同义典型序列
同义长度
synonymous mapping
semantic entropy
semantic relative entropy
up/down semantic mutual information
semantic channel capacity
semantic distortion
semantic rate-distortion function
semantically typical set
synonymous typical set
semantically jointly typical set
jointly typical decoding
jointly typical encoding
synonymous length
maximum likelihood group decoding
semantic source channel coding