Word sense disambiguation(WSD)is a fundamental but significant task in natural language processing,which directly affects the performance of upper applications.However,WSD is very challenging due to the problem of kno...Word sense disambiguation(WSD)is a fundamental but significant task in natural language processing,which directly affects the performance of upper applications.However,WSD is very challenging due to the problem of knowledge bottleneck,i.e.,it is hard to acquire abundant disambiguation knowledge,especially in Chinese.To solve this problem,this paper proposes a graph-based Chinese WSD method with multi-knowledge integration.Particularly,a graph model combining various Chinese and English knowledge resources by word sense mapping is designed.Firstly,the content words in a Chinese ambiguous sentence are extracted and mapped to English words with BabelNet.Then,English word similarity is computed based on English word embeddings and knowledge base.Chinese word similarity is evaluated with Chinese word embedding and HowNet,respectively.The weights of the three kinds of word similarity are optimized with simulated annealing algorithm so as to obtain their overall similarities,which are utilized to construct a disambiguation graph.The graph scoring algorithm evaluates the importance of each word sense node and judge the right senses of the ambiguous words.Extensive experimental results on SemEval dataset show that our proposed WSD method significantly outperforms the baselines.展开更多
A graph G=(V,E) is representable if there exists a word W over the alphabet V such that letters x and y alternate in W if and only if (x ,y) is in E for each x not equal to y . The motivation to study representable gr...A graph G=(V,E) is representable if there exists a word W over the alphabet V such that letters x and y alternate in W if and only if (x ,y) is in E for each x not equal to y . The motivation to study representable graphs came from algebra, but this subject is interesting from graph theoretical, computer science, and combinatorics on words points of view. In this paper, we prove that for n greater than 3, the line graph of an n-wheel is non-representable. This not only provides a new construction of non-repre- sentable graphs, but also answers an open question on representability of the line graph of the 5-wheel, the minimal non-representable graph. Moreover, we show that for n greater than 4, the line graph of the complete graph is also non-representable. We then use these facts to prove that given a graph G which is not a cycle, a path or a claw graph, the graph obtained by taking the line graph of G k-times is guaranteed to be non-representable for k greater than 3.展开更多
Visualization methods for single documents are either too simple, considering word frequency only, or depend on syntactic and semantic information bases to be more useful. This paper presents an intermediary approach,...Visualization methods for single documents are either too simple, considering word frequency only, or depend on syntactic and semantic information bases to be more useful. This paper presents an intermediary approach, based on H. P. Luhn’s automatic abstract creation algorithm, and intends to aggregate more information to document visualization than word counting methods do without the need of external sources. The method takes pairs of relevant words and computes the linkage force between them. Relevant words become vertices and links become edges in the resulting graph.展开更多
近年来,情感分析是近年来自然语言处理领域备受学者关注的核心研究方向,传统文本情感分析模型只能捕捉文本的表面特征,在不同领域或语境下缺乏泛化能力,难以处理长文本以及语义歧义等问题.针对上述问题,本文设计了基于图神经网络与表示...近年来,情感分析是近年来自然语言处理领域备受学者关注的核心研究方向,传统文本情感分析模型只能捕捉文本的表面特征,在不同领域或语境下缺乏泛化能力,难以处理长文本以及语义歧义等问题.针对上述问题,本文设计了基于图神经网络与表示学习的文本情感分析模型(a text sentiment analysis model based on graph neural networks and representation learning,GNNRL).利用Spacy生成句子的语法依赖树,利用图卷积神经网络进行编码,以捕捉句子中词语之间更复杂的关系;采用动态k-max池化进一步筛选特征,保留文本相对位置的序列特征,避免部分特征损失的问题,从而提高模型的特征提取能力.最后将情感特征向量输送到分类器SoftMax中,根据归一化后的值来判断情感分类.为验证本文所提GNNRL模型的有效性,采用OS10和SMP2020两个文本情感分析数据集进行测试,与HyperGAT、IBHC、BERT_CNN、BERT_GCN、TextGCN模型比较,结果表明,综合accuracy、precision、recall、f14个指标,本文改进的AM_DNN模型均优于其他模型,在文本情感中具有较好的分类性能,并探究了不同优化器的选择对本模型的影响.展开更多
基金The research work is supported by National Key R&D Program of China under Grant No.2018YFC0831704National Nature Science Foundation of China under Grant No.61502259+1 种基金Natural Science Foundation of Shandong Province under Grant No.ZR2017MF056Taishan Scholar Program of Shandong Province in China(Directed by Prof.Yinglong Wang).
文摘Word sense disambiguation(WSD)is a fundamental but significant task in natural language processing,which directly affects the performance of upper applications.However,WSD is very challenging due to the problem of knowledge bottleneck,i.e.,it is hard to acquire abundant disambiguation knowledge,especially in Chinese.To solve this problem,this paper proposes a graph-based Chinese WSD method with multi-knowledge integration.Particularly,a graph model combining various Chinese and English knowledge resources by word sense mapping is designed.Firstly,the content words in a Chinese ambiguous sentence are extracted and mapped to English words with BabelNet.Then,English word similarity is computed based on English word embeddings and knowledge base.Chinese word similarity is evaluated with Chinese word embedding and HowNet,respectively.The weights of the three kinds of word similarity are optimized with simulated annealing algorithm so as to obtain their overall similarities,which are utilized to construct a disambiguation graph.The graph scoring algorithm evaluates the importance of each word sense node and judge the right senses of the ambiguous words.Extensive experimental results on SemEval dataset show that our proposed WSD method significantly outperforms the baselines.
文摘A graph G=(V,E) is representable if there exists a word W over the alphabet V such that letters x and y alternate in W if and only if (x ,y) is in E for each x not equal to y . The motivation to study representable graphs came from algebra, but this subject is interesting from graph theoretical, computer science, and combinatorics on words points of view. In this paper, we prove that for n greater than 3, the line graph of an n-wheel is non-representable. This not only provides a new construction of non-repre- sentable graphs, but also answers an open question on representability of the line graph of the 5-wheel, the minimal non-representable graph. Moreover, we show that for n greater than 4, the line graph of the complete graph is also non-representable. We then use these facts to prove that given a graph G which is not a cycle, a path or a claw graph, the graph obtained by taking the line graph of G k-times is guaranteed to be non-representable for k greater than 3.
文摘Visualization methods for single documents are either too simple, considering word frequency only, or depend on syntactic and semantic information bases to be more useful. This paper presents an intermediary approach, based on H. P. Luhn’s automatic abstract creation algorithm, and intends to aggregate more information to document visualization than word counting methods do without the need of external sources. The method takes pairs of relevant words and computes the linkage force between them. Relevant words become vertices and links become edges in the resulting graph.
文摘近年来,情感分析是近年来自然语言处理领域备受学者关注的核心研究方向,传统文本情感分析模型只能捕捉文本的表面特征,在不同领域或语境下缺乏泛化能力,难以处理长文本以及语义歧义等问题.针对上述问题,本文设计了基于图神经网络与表示学习的文本情感分析模型(a text sentiment analysis model based on graph neural networks and representation learning,GNNRL).利用Spacy生成句子的语法依赖树,利用图卷积神经网络进行编码,以捕捉句子中词语之间更复杂的关系;采用动态k-max池化进一步筛选特征,保留文本相对位置的序列特征,避免部分特征损失的问题,从而提高模型的特征提取能力.最后将情感特征向量输送到分类器SoftMax中,根据归一化后的值来判断情感分类.为验证本文所提GNNRL模型的有效性,采用OS10和SMP2020两个文本情感分析数据集进行测试,与HyperGAT、IBHC、BERT_CNN、BERT_GCN、TextGCN模型比较,结果表明,综合accuracy、precision、recall、f14个指标,本文改进的AM_DNN模型均优于其他模型,在文本情感中具有较好的分类性能,并探究了不同优化器的选择对本模型的影响.