The meaning of a word includes a conceptual meaning and a distributive meaning.Word embedding based on distribution suffers from insufficient conceptual semantic representation caused by data sparsity,especially for l...The meaning of a word includes a conceptual meaning and a distributive meaning.Word embedding based on distribution suffers from insufficient conceptual semantic representation caused by data sparsity,especially for low-frequency words.In knowledge bases,manually annotated semantic knowledge is stable and the essential attributes of words are accurately denoted.In this paper,we propose a Conceptual Semantics Enhanced Word Representation(CEWR)model,computing the synset embedding and hypernym embedding of Chinese words based on the Tongyici Cilin thesaurus,and aggregating it with distributed word representation to have both distributed information and the conceptual meaning encoded in the representation of words.We evaluate the CEWR model on two tasks:word similarity computation and short text classification.The Spearman correlation between model results and human judgement are improved to 64.71%,81.84%,and 85.16%on Wordsim297,MC30,and RG65,respectively.Moreover,CEWR improves the F1 score by 3%in the short text classification task.The experimental results show that CEWR can represent words in a more informative approach than distributed word embedding.This proves that conceptual semantics,especially hypernymous information,is a good complement to distributed word representation.展开更多
In Mandarin Chinese,when the noun head appears in the context,a quantity noun phrase can be reduced to a quantity phrase with the noun head omitted.This phrase structure is called elliptical quantity noun phrase.The a...In Mandarin Chinese,when the noun head appears in the context,a quantity noun phrase can be reduced to a quantity phrase with the noun head omitted.This phrase structure is called elliptical quantity noun phrase.The automatic recovery of elliptical quantity noun phrase is crucial in syntactic parsing,semantic representation and other downstream tasks.In this paper,we propose a hybrid neural network model to identify the semantic category for elliptical quantity noun phrases and realize the recovery of omitted semantics by supplementing concept categories.Firstly,we use BERT to generate character-level vectors.Secondly,Bi-LSTM is applied to capture the context information of each character and compress the input into the context memory history.Then CNN is utilized to capture the local semantics of n-grams with various granularities.Based on the Chinese Abstract Meaning Representation(CAMR)corpus and Xinhua News Agency corpus,we construct a hand-labeled elliptical quantity noun phrase dataset and carry out the semantic recovery of elliptical quantity noun phrase on this dataset.The experimental results show that our hybrid neural network model can effectively improve the performance of the semantic complement for the elliptical quantity noun phrases.展开更多
基金This research is supported by the National Science Foundation of China(grant 61772278,author:Qu,W.grant number:61472191,author:Zhou,J.http://www.nsfc.gov.cn/)+2 种基金the National Social Science Foundation of China(grant number:18BYY127,author:Li B.http://www.cssn.cn)the Philosophy and Social Science Foundation of Jiangsu Higher Institution(grant number:2019SJA0220,author:Wei,T.https://jyt.jiangsu.gov.cn)Jiangsu Higher Institutions’Excellent Innovative Team for Philosophy and Social Science(grant number:2017STD006,author:Gu,W.https://jyt.jiangsu.gov.cn)。
文摘The meaning of a word includes a conceptual meaning and a distributive meaning.Word embedding based on distribution suffers from insufficient conceptual semantic representation caused by data sparsity,especially for low-frequency words.In knowledge bases,manually annotated semantic knowledge is stable and the essential attributes of words are accurately denoted.In this paper,we propose a Conceptual Semantics Enhanced Word Representation(CEWR)model,computing the synset embedding and hypernym embedding of Chinese words based on the Tongyici Cilin thesaurus,and aggregating it with distributed word representation to have both distributed information and the conceptual meaning encoded in the representation of words.We evaluate the CEWR model on two tasks:word similarity computation and short text classification.The Spearman correlation between model results and human judgement are improved to 64.71%,81.84%,and 85.16%on Wordsim297,MC30,and RG65,respectively.Moreover,CEWR improves the F1 score by 3%in the short text classification task.The experimental results show that CEWR can represent words in a more informative approach than distributed word embedding.This proves that conceptual semantics,especially hypernymous information,is a good complement to distributed word representation.
基金This research is supported by the National Science Foundation of China(Grant 61772278,author:Qu,W.Grant Number:61472191,author:Zhou,J.http://www.nsfc.gov.cn/),the National Social Science Foundation of China(Grant Number:18BYY127,author:Li B.http://www.cssn.cn),the Philosophy and Social Science Foundation of Jiangsu Higher Institution(Grant Number:2019SJA0220,author:Wei,T.https://jyt.jiangsu.gov.cn)and Jiangsu Higher Institutions’Excellent Innovative Team for Philosophy and Social Science(Grant Number:2017STD006,author:Qu,W.https://jyt.jiangsu.gov.cn)。
文摘In Mandarin Chinese,when the noun head appears in the context,a quantity noun phrase can be reduced to a quantity phrase with the noun head omitted.This phrase structure is called elliptical quantity noun phrase.The automatic recovery of elliptical quantity noun phrase is crucial in syntactic parsing,semantic representation and other downstream tasks.In this paper,we propose a hybrid neural network model to identify the semantic category for elliptical quantity noun phrases and realize the recovery of omitted semantics by supplementing concept categories.Firstly,we use BERT to generate character-level vectors.Secondly,Bi-LSTM is applied to capture the context information of each character and compress the input into the context memory history.Then CNN is utilized to capture the local semantics of n-grams with various granularities.Based on the Chinese Abstract Meaning Representation(CAMR)corpus and Xinhua News Agency corpus,we construct a hand-labeled elliptical quantity noun phrase dataset and carry out the semantic recovery of elliptical quantity noun phrase on this dataset.The experimental results show that our hybrid neural network model can effectively improve the performance of the semantic complement for the elliptical quantity noun phrases.