认知语义学是认知语言学的一个组成部分。通常认为,认知语言学在我国的兴起与发展只有四五十年的历史,是中国语言学各种理论中比较年轻的一种,也可以看作是中国语言学的一个分支学科。语言学学者现在经常提到的构式研究、认知语法、认...认知语义学是认知语言学的一个组成部分。通常认为,认知语言学在我国的兴起与发展只有四五十年的历史,是中国语言学各种理论中比较年轻的一种,也可以看作是中国语言学的一个分支学科。语言学学者现在经常提到的构式研究、认知语法、认知语义,都属于认知语言学的不同组成部分。认知语义学打破了语言知识和真实世界之间的隔阂,从人类的认知过程出发研究意义,开辟了语义研究的新视角。由北京大学出版社出版的汉译本《认知语义学:概念构建的类型和过程》(英文原著名为Toward a Cognitive Semantics)。展开更多
基于分布式语义学理论的词向量蕴含了丰富的语义信息,一定程度上标志着自然语言处理和计算语言学领域进入了大模型发展时代。由于词向量的可计算属性,逐渐发展出了多种基于词向量的语义计算任务,语义关系辨析便是语义计算任务当中重要...基于分布式语义学理论的词向量蕴含了丰富的语义信息,一定程度上标志着自然语言处理和计算语言学领域进入了大模型发展时代。由于词向量的可计算属性,逐渐发展出了多种基于词向量的语义计算任务,语义关系辨析便是语义计算任务当中重要的一项。本研究基于fastText中文词向量和腾讯中文词向量的方法计算出表征语义关联强度的余弦相似度值,并得出以下结论:fastText中文词向量和腾讯中文词向量在辨别近义关系、反义关系、上下义关系、部分–整体关系这4种语义关系的任务上表现存在一定差异;通过比较Spearman相关系数,fastText中文词向量在实验数据上表现出其习得了更强的语义相似度特征,腾讯中文词向量则体现出其学习到了更强的语义相关度特征;在反义词辨析任务上,fastText中文词向量和腾讯中文词向量都在高度规约化的反义词对上计算出很高的余弦相似度值。The word embeddings, based on the distributed semantics theory, which contains rich linguistic information, have contributed a lot to the development of large language model (LLM) in the fields of natural language processing and computational linguistics. Due to the computable properties of word embeddings, various semantic computing tasks based on them have gradually emerged, among which semantic relation discrimination is an important task in semantic computation. In our study, we adopt two word-embedding methods, the fastText Chinese word embeddings and the Tencent Chinese word embeddings, to calculate Chinese semantic relations, where the cosine similarity is used to represent the semantic association strength between words. The following are our findings in this study: First, the fastText Chinese embeddings and the Tencent Chinese embeddings show some differences in the task of distinguishing the four types of semantic relation in Chinese, namely, synonymy, antonymy, hyponymy and meronymy;Second, by comparing the Spearman correlation coefficient, the fastText embeddings have acquired more knowledge of semantic similarity between words, while the Tencent Chinese word embeddings have acquired more knowledge of semantic relatedness between words;Third, both the fastText Chinese embeddings and the Tencent Chinese word embeddings give higher values of cosine similarity to highly conventionalized antonyms.展开更多
文摘认知语义学是认知语言学的一个组成部分。通常认为,认知语言学在我国的兴起与发展只有四五十年的历史,是中国语言学各种理论中比较年轻的一种,也可以看作是中国语言学的一个分支学科。语言学学者现在经常提到的构式研究、认知语法、认知语义,都属于认知语言学的不同组成部分。认知语义学打破了语言知识和真实世界之间的隔阂,从人类的认知过程出发研究意义,开辟了语义研究的新视角。由北京大学出版社出版的汉译本《认知语义学:概念构建的类型和过程》(英文原著名为Toward a Cognitive Semantics)。
文摘基于分布式语义学理论的词向量蕴含了丰富的语义信息,一定程度上标志着自然语言处理和计算语言学领域进入了大模型发展时代。由于词向量的可计算属性,逐渐发展出了多种基于词向量的语义计算任务,语义关系辨析便是语义计算任务当中重要的一项。本研究基于fastText中文词向量和腾讯中文词向量的方法计算出表征语义关联强度的余弦相似度值,并得出以下结论:fastText中文词向量和腾讯中文词向量在辨别近义关系、反义关系、上下义关系、部分–整体关系这4种语义关系的任务上表现存在一定差异;通过比较Spearman相关系数,fastText中文词向量在实验数据上表现出其习得了更强的语义相似度特征,腾讯中文词向量则体现出其学习到了更强的语义相关度特征;在反义词辨析任务上,fastText中文词向量和腾讯中文词向量都在高度规约化的反义词对上计算出很高的余弦相似度值。The word embeddings, based on the distributed semantics theory, which contains rich linguistic information, have contributed a lot to the development of large language model (LLM) in the fields of natural language processing and computational linguistics. Due to the computable properties of word embeddings, various semantic computing tasks based on them have gradually emerged, among which semantic relation discrimination is an important task in semantic computation. In our study, we adopt two word-embedding methods, the fastText Chinese word embeddings and the Tencent Chinese word embeddings, to calculate Chinese semantic relations, where the cosine similarity is used to represent the semantic association strength between words. The following are our findings in this study: First, the fastText Chinese embeddings and the Tencent Chinese embeddings show some differences in the task of distinguishing the four types of semantic relation in Chinese, namely, synonymy, antonymy, hyponymy and meronymy;Second, by comparing the Spearman correlation coefficient, the fastText embeddings have acquired more knowledge of semantic similarity between words, while the Tencent Chinese word embeddings have acquired more knowledge of semantic relatedness between words;Third, both the fastText Chinese embeddings and the Tencent Chinese word embeddings give higher values of cosine similarity to highly conventionalized antonyms.