期刊文献+
共找到7篇文章
< 1 >
每页显示 20 50 100
Enhancing Embedding-Based Chinese Word Similarity Evaluation with Concepts and Synonyms Knowledge
1
作者 Fulian Yin Yanyan Wang +1 位作者 Jianbo Liu Meiqi Ji 《Computer Modeling in Engineering & Sciences》 SCIE EI 2020年第8期747-764,共18页
Word similarity(WS)is a fundamental and critical task in natural language processing.Existing approaches to WS are mainly to calculate the similarity or relatedness of word pairs based on word embedding obtained by ma... Word similarity(WS)is a fundamental and critical task in natural language processing.Existing approaches to WS are mainly to calculate the similarity or relatedness of word pairs based on word embedding obtained by massive and high-quality corpus.However,it may suffer from poor performance for insufficient corpus in some specific fields,and cannot capture rich semantic and sentimental information.To address these above problems,we propose an enhancing embedding-based word similarity evaluation with character-word concepts and synonyms knowledge,namely EWS-CS model,which can provide extra semantic information to enhance word similarity evaluation.The core of our approach contains knowledge encoder and word encoder.In knowledge encoder,we incorporate the semantic knowledge extracted from knowledge resources,including character-word concepts,synonyms and sentiment lexicons,to obtain knowledge representation.Word encoder is to learn enhancing embedding-based word representation from pre-trained model and knowledge representation based on similarity task.Finally,compared with baseline models,the experiments on four similarity evaluation datasets validate the effectiveness of our EWS-CS model in WS task. 展开更多
关键词 word representation concepts and synonyms knowledge word similarity information security
下载PDF
The Research of Chinese Words Semantic Similarity Calculation with Multi-Information 被引量:1
2
作者 Rihong Wang Chenglong Wang +1 位作者 Ying Xu Xingmei Cui 《International Journal of Intelligence Science》 2016年第3期17-28,共13页
Text similarity has a relatively wide range of applications in many fields, such as intelligent information retrieval, question answering system, text rechecking, machine translation, and so on. The text similarity co... Text similarity has a relatively wide range of applications in many fields, such as intelligent information retrieval, question answering system, text rechecking, machine translation, and so on. The text similarity computing based on the meaning has been used more widely in the similarity computing of the words and phrase. Using the knowledge structure of the and its method of knowledge description, taking into account the other factor and weight that influenced similarity, making full use of depth and density of the Concept-Sememe tree, an improved method of Chinese word similarity calculation based on semantic distance was provided in this paper. Finally the effectiveness of this method was verified by the simulation results. 展开更多
关键词 HOWNET similarity Chinese words similarity MULTI-INFORMATION
下载PDF
Graph-Based Chinese Word Sense Disambiguation with Multi-Knowledge Integration 被引量:1
3
作者 Wenpeng Lu Fanqing Meng +4 位作者 Shoujin Wang Guoqiang Zhang Xu Zhang Antai Ouyang Xiaodong Zhang 《Computers, Materials & Continua》 SCIE EI 2019年第7期197-212,共16页
Word sense disambiguation(WSD)is a fundamental but significant task in natural language processing,which directly affects the performance of upper applications.However,WSD is very challenging due to the problem of kno... Word sense disambiguation(WSD)is a fundamental but significant task in natural language processing,which directly affects the performance of upper applications.However,WSD is very challenging due to the problem of knowledge bottleneck,i.e.,it is hard to acquire abundant disambiguation knowledge,especially in Chinese.To solve this problem,this paper proposes a graph-based Chinese WSD method with multi-knowledge integration.Particularly,a graph model combining various Chinese and English knowledge resources by word sense mapping is designed.Firstly,the content words in a Chinese ambiguous sentence are extracted and mapped to English words with BabelNet.Then,English word similarity is computed based on English word embeddings and knowledge base.Chinese word similarity is evaluated with Chinese word embedding and HowNet,respectively.The weights of the three kinds of word similarity are optimized with simulated annealing algorithm so as to obtain their overall similarities,which are utilized to construct a disambiguation graph.The graph scoring algorithm evaluates the importance of each word sense node and judge the right senses of the ambiguous words.Extensive experimental results on SemEval dataset show that our proposed WSD method significantly outperforms the baselines. 展开更多
关键词 word sense disambiguation graph model multi-knowledge integration word similarity
下载PDF
Novel Representations of Word Embedding Based on the Zolu Function
4
作者 Jihua Lu Youcheng Zhang 《Journal of Beijing Institute of Technology》 EI CAS 2020年第4期526-530,共5页
Two learning models,Zolu-continuous bags of words(ZL-CBOW)and Zolu-skip-grams(ZL-SG),based on the Zolu function are proposed.The slope of Relu in word2vec has been changed by the Zolu function.The proposed models can ... Two learning models,Zolu-continuous bags of words(ZL-CBOW)and Zolu-skip-grams(ZL-SG),based on the Zolu function are proposed.The slope of Relu in word2vec has been changed by the Zolu function.The proposed models can process extremely large data sets as well as word2vec without increasing the complexity.Also,the models outperform several word embedding methods both in word similarity and syntactic accuracy.The method of ZL-CBOW outperforms CBOW in accuracy by 8.43%on the training set of capital-world,and by 1.24%on the training set of plural-verbs.Moreover,experimental simulations on word similarity and syntactic accuracy show that ZL-CBOW and ZL-SG are superior to LL-CBOW and LL-SG,respectively. 展开更多
关键词 Zolu function word embedding continuous bags of words word similarity accuracy
下载PDF
Vari-gram language model based on word clustering
5
作者 袁里驰 《Journal of Central South University》 SCIE EI CAS 2012年第4期1057-1062,共6页
Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with g... Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with good performance and less computation.2) Class-based method always loses the prediction ability to adapt the text in different domains.In order to solve above problems,a definition of word similarity by utilizing mutual information was presented.Based on word similarity,the definition of word set similarity was given.Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance,and the perplexity is reduced from 283 to 218.At the same time,an absolute weighted difference method was presented and was used to construct vari-gram language model which has good prediction ability.The perplexity of vari-gram model is reduced from 234.65 to 219.14 on Chinese corpora,and is reduced from 195.56 to 184.25 on English corpora compared with category-based model. 展开更多
关键词 word similarity word clustering statistical language model vari-gram language model
下载PDF
Measuring Similarity of Academic Articles with Semantic Profile and Joint Word Embedding 被引量:11
6
作者 Ming Liu Bo Lang +1 位作者 Zepeng Gu Ahmed Zeeshan 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2017年第6期619-632,共14页
Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the sema... Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the semantic similarity of short texts. Document-level semantic measurement remains an open issue due to problems such as the omission of background knowledge and topic transition. In this paper, we propose a novel semantic matching method for long documents in the academic domain. To accurately represent the general meaning of an academic article, we construct a semantic profile in which key semantic elements such as the research purpose, methodology, and domain are included and enriched. As such, we can obtain the overall semantic similarity of two papers by computing the distance between their profiles. The distances between the concepts of two different semantic profiles are measured by word vectors. To improve the semantic representation quality of word vectors, we propose a joint word-embedding model for incorporating a domain-specific semantic relation constraint into the traditional context constraint. Our experimental results demonstrate that, in the measurement of document semantic similarity, our approach achieves substantial improvement over state-of-the-art methods, and our joint word-embedding model produces significantly better word representations than traditional word-embedding models. 展开更多
关键词 document semantic similarity text understanding semantic enrichment word embedding scientific literature analysis
原文传递
Research on calculation method of text similarity based on smooth inverse frequency 被引量:2
7
作者 Yuan Ye Yu Minmin Liu Jiming 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2020年第2期56-64,共9页
In order to improve the accuracy of text similarity calculation,this paper presents a text similarity function part of speech and word order-smooth inverse frequency(PO-SIF)based on sentence vector,which optimizes the... In order to improve the accuracy of text similarity calculation,this paper presents a text similarity function part of speech and word order-smooth inverse frequency(PO-SIF)based on sentence vector,which optimizes the classical SIF calculation method in two aspects:part of speech and word order.The classical SIF algorithm is to calculate sentence similarity by getting a sentence vector through weighting and reducing noise.However,the different methods of weighting or reducing noise would affect the efficiency and the accuracy of similarity calculation.In our proposed PO-SIF,the weight parameters of the SIF sentence vector are first updated by the part of speech subtraction factor,to determine the most crucial words.Furthermore,PO-SIF calculates the sentence vector similarity taking into the account of word order,which overcomes the drawback of similarity analysis that is mostly based on the word frequency.The experimental results validate the performance of our proposed PO-SIF on improving the accuracy of text similarity calculation. 展开更多
关键词 word2vec SIF part-of-speech word order similarity
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部