Word Sense Disambiguation has been a trending topic of research in Natural Language Processing and Machine Learning.Mining core features and performing the text classification still exist as a challenging task.Here the...Word Sense Disambiguation has been a trending topic of research in Natural Language Processing and Machine Learning.Mining core features and performing the text classification still exist as a challenging task.Here the features of the context such as neighboring words like adjective provide the evidence for classification using machine learning approach.This paper presented the text document classification that has wide applications in information retrieval,which uses movie review datasets.Here the document indexing based on controlled vocabulary,adjective,word sense disambiguation,generating hierarchical cate-gorization of web pages,spam detection,topic labeling,web search,document summarization,etc.Here the kernel support vector machine learning algorithm helps to classify the text and feature extract is performed by cuckoo search opti-mization.Positive review and negative review of movie dataset is presented to get the better classification accuracy.Experimental results focused with context mining,feature analysis and classification.By comparing with the previous work,proposed work designed to achieve the efficient results.Overall design is per-formed with MATLAB 2020a tool.展开更多
All human languages have words that can mean different things in different contexts, such words with multiple meanings are potentially “ambiguous”. The process of “deciding which of several meanings of a term is in...All human languages have words that can mean different things in different contexts, such words with multiple meanings are potentially “ambiguous”. The process of “deciding which of several meanings of a term is intended in a given context” is known as “word sense disambiguation (WSD)”. This paper presents a method of WSD that assigns a target word the sense that is most related to the senses of its neighbor words. We explore the use of measures of relatedness between word senses based on a novel hybrid approach. First, we investigate how to “literally” and “regularly” express a “concept”. We apply set algebra to WordNet’s synsets cooperating with WordNet’s word ontology. In this way we establish regular rules for constructing various representations (lexical notations) of a concept using Boolean operators and word forms in various synset(s) defined in WordNet. Then we establish a formal mechanism for quantifying and estimating the semantic relatedness between concepts—we facilitate “concept distribution statistics” to determine the degree of semantic relatedness between two lexically expressed con- cepts. The experimental results showed good performance on Semcor, a subset of Brown corpus. We observe that measures of semantic relatedness are useful sources of information for WSD.展开更多
This paper presents a new approach to determining whether an interested personal name across doeuments refers to the same entity. Firstly,three vectors for each text are formed: the personal name Boolean vectors deno...This paper presents a new approach to determining whether an interested personal name across doeuments refers to the same entity. Firstly,three vectors for each text are formed: the personal name Boolean vectors denoting whether a personal name occurs the text the biographical word Boolean vector representing title, occupation and so forth, and the feature vector with real values. Then, by combining a heuristic strategy based on Boolean vectors with an agglomeratie clustering algorithm based on feature vectors, it seeks to resolve multi-document personal name coreference. Experimental results show that this approach achieves a good performance by testing on "Wang Gang" corpus.展开更多
It’s common that different individuals share the same name, which makes it time-consuming to search information of a particular individual on the web. Name disambiguation study is necessary to help users find the per...It’s common that different individuals share the same name, which makes it time-consuming to search information of a particular individual on the web. Name disambiguation study is necessary to help users find the person of interest more readily. In this paper, we propose an Adaptive Resonance Theory (ART) based two-stage strategy for this problem. We get a first-stage clustering result with ART1 model and then merge similar clusters in the second stage. Our strategy is a mimic process of manual disambiguation and need not to predict the number of clusters, which makes it competent for the disambiguation task. Experimental results show that, in comparison with the agglomerative clustering method, our strategy improves the performance by respectively 0.92% and 5.00% on two kinds of name recognition results.展开更多
Word sense disambiguation(WSD)is a fundamental but significant task in natural language processing,which directly affects the performance of upper applications.However,WSD is very challenging due to the problem of kno...Word sense disambiguation(WSD)is a fundamental but significant task in natural language processing,which directly affects the performance of upper applications.However,WSD is very challenging due to the problem of knowledge bottleneck,i.e.,it is hard to acquire abundant disambiguation knowledge,especially in Chinese.To solve this problem,this paper proposes a graph-based Chinese WSD method with multi-knowledge integration.Particularly,a graph model combining various Chinese and English knowledge resources by word sense mapping is designed.Firstly,the content words in a Chinese ambiguous sentence are extracted and mapped to English words with BabelNet.Then,English word similarity is computed based on English word embeddings and knowledge base.Chinese word similarity is evaluated with Chinese word embedding and HowNet,respectively.The weights of the three kinds of word similarity are optimized with simulated annealing algorithm so as to obtain their overall similarities,which are utilized to construct a disambiguation graph.The graph scoring algorithm evaluates the importance of each word sense node and judge the right senses of the ambiguous words.Extensive experimental results on SemEval dataset show that our proposed WSD method significantly outperforms the baselines.展开更多
Stereolithographic(STL)files have been extensively used in rapid prototyping industries as well as many other fields as watermarking algorithms to secure intellectual property and protect three-dimensional models from...Stereolithographic(STL)files have been extensively used in rapid prototyping industries as well as many other fields as watermarking algorithms to secure intellectual property and protect three-dimensional models from theft.However,to the best of our knowledge,few studies have looked at how watermarking can resist attacks that involve vertex-reordering.Here,we present a lossless and robust watermarking scheme for STL files to protect against vertexreordering attacks.Specifically,we designed a novel error-correcting code(ECC)that can correct the error of any one-bit in a bitstream by inserting several check digits.In addition,ECC is designed to make use of redundant information according to the characteristics of STL files,which introduces further robustness for defense against attacks.No modifications are made to the geometric information of the three-dimensional model,which respects the requirements of a highprecision model.The experimental results show that the proposed watermarking scheme can survive numerous kinds of attack,including rotation,scaling and translation(RST),facet reordering,and vertex-reordering attacks.展开更多
Word Sense Disambiguation (WSD) is to decide the sense of an ambiguous word on particular context. Most of current studies on WSD only use several ambiguous words as test samples, thus leads to some limitation in prac...Word Sense Disambiguation (WSD) is to decide the sense of an ambiguous word on particular context. Most of current studies on WSD only use several ambiguous words as test samples, thus leads to some limitation in practical application. In this paper, we perform WSD study based on large scale real-world corpus using two unsupervised learning algorithms based on ±n-improved Bayesian model and Dependency Grammar (DG)-improved Bayesian model. ±n-improved classifiers reduce the window size of context of ambiguous words with close-distance feature extraction method, and decrease the jamming of useless features, thus obviously improve the accuracy, reaching 83.18% (in open test). DG-improved classifier can more effectively conquer the noise effect existing in Naive-Bayesian classifier. Experimental results show that this approach does better on Chinese WSD, and the open test achieved an accuracy of 86.27%.展开更多
A sense feature system (SFS) is first automatically constructed from the text corpora to structurize the textural information. WSD rules are then extracted from SFS according to their certainty factors and are applied...A sense feature system (SFS) is first automatically constructed from the text corpora to structurize the textural information. WSD rules are then extracted from SFS according to their certainty factors and are applied to disambiguate the senses of polysemous words. The entropy of a deterministic rough prediction is used to measure the decision quality of a rule set. Finally, the back off rule smoothing method is further designed to improve the performance of a WSD model. In the experiments, a mean rate of correction achieved during experiments for WSD in the case of rule smoothing is 0.92.展开更多
The natural language processing has a set of phases that evolves from lexical text analysis to the pragmatic one in which the author’s intentions are shown. The ambiguity problem appears in all of these tasks. Previo...The natural language processing has a set of phases that evolves from lexical text analysis to the pragmatic one in which the author’s intentions are shown. The ambiguity problem appears in all of these tasks. Previous works tries to do word sense disambiguation, the process of assign a sense to a word inside a specific context, creating algorithms under a supervised or unsupervised approach, which means that those algorithms use or not an external lexical resource. This paper presents an approximated approach that combines not supervised algorithms by the use of a classifiers set, the result will be a learning algorithm based on unsupervised methods for word sense disambiguation process. It begins with an introduction to word sense disambiguation concepts and then analyzes some unsupervised algorithms in order to extract the best of them, and combines them under a supervised approach making use of some classifiers.展开更多
An improved name disambiguation method based on atom cluster. Aiming at the method of character-related properties of similarity based on information extraction depends on the character information, a new name disambi...An improved name disambiguation method based on atom cluster. Aiming at the method of character-related properties of similarity based on information extraction depends on the character information, a new name disambiguation method is proposed, and improved k-means algorism for name disambiguation is proposed in this paper. The cluster analysis cluster is introduced to the name disambiguation process. Experiment results show that the proposed method having the high implementation efficiency and can distinguish the different people with the same name.展开更多
This paper proved the statement that a good linear block encoder is in fact a good local-random sequence generator. Furthermore, this statement discovers the deep relationship between the error-correcting coding theor...This paper proved the statement that a good linear block encoder is in fact a good local-random sequence generator. Furthermore, this statement discovers the deep relationship between the error-correcting coding theory and the modern cryptography.展开更多
Every term has a meaning but there are terms which have multiple meanings. Identifying the correct meaning of a term in a specific context is the goal of Word Sense Disambiguation (WSD) applications. Identifying the c...Every term has a meaning but there are terms which have multiple meanings. Identifying the correct meaning of a term in a specific context is the goal of Word Sense Disambiguation (WSD) applications. Identifying the correct sense of a term given a limited context is even harder. This research aims at solving the problem of identifying the correct sense of a term given only one term as its context. The main focus of this research is on using Wikipedia as the external knowledge source to decipher the true meaning of each term using a single term as the context. We experimented with the semantically rich Wikipedia senses and hyperlinks for context disambiguation. We also analyzed the effect of sense filtering on context extraction and found it quite effective for contextual disambiguation. Results have shown that disambiguation with filtering works quite well on manually disambiguated dataset with the performance accuracy of 86%.展开更多
A name disambiguation method is proposed based on attribute match and link analysis applying in the field of insurance. Aiming at the former name disambiguation methods such as text clustering method needs to be consi...A name disambiguation method is proposed based on attribute match and link analysis applying in the field of insurance. Aiming at the former name disambiguation methods such as text clustering method needs to be considered in a lot of useless words, a new name disambiguation method is advanced. Firstly, the same attribute matching is applied, merging the identity of a successful match, secondly, the link analysis is used, structural analysis of customers network is analyzed, Finally, the same cooperating information is merged. Experiment results show that the proposed method can realize name disambiguation successfully.展开更多
The input of a network is the key problem for Chinese word sense disambiguation utilizing the neural network. This paper presents an input model of the neural network that calculates the mutual information between con...The input of a network is the key problem for Chinese word sense disambiguation utilizing the neural network. This paper presents an input model of the neural network that calculates the mutual information between contextual words and the ambiguous word by using statistical methodology and taking the contextual words of a certain number beside the ambiguous word according to (-M,+N).The experiment adopts triple-layer BP Neural Network model and proves how the size of a training set and the value of Mand Naffect the performance of the Neural Network Model. The experimental objects are six pseudowords owning three word-senses constructed according to certain principles. The tested accuracy of our approach on a closed-corpus reaches 90.31%, and 89.62% on an open-corpus. The experiment proves that the Neural Network Model has a good performance on Word Sense Disambiguation.展开更多
Quantum secret sharing(QSS) is a procedure of sharing classical information or quantum information by using quantum states. This paper presents how to use a [2k- 1, 1, k] quantum error-correcting code (QECC) to im...Quantum secret sharing(QSS) is a procedure of sharing classical information or quantum information by using quantum states. This paper presents how to use a [2k- 1, 1, k] quantum error-correcting code (QECC) to implement a quantum (k, 2k-1) threshold scheme. It also takes advantage of classical enhancement of the [2k-1, 1, k] QECC to establish a QSS scheme which can share classical information and quantum information simultaneously. Because information is encoded into QECC, these schemes can prevent intercept-resend attacks and be implemented on some noisy channels.展开更多
Word sense disambiguation is used in many natural language processing fields. One of the ways of disambiguation is the use of decision list algorithm which is a supervised method. Supervised methods are considered as ...Word sense disambiguation is used in many natural language processing fields. One of the ways of disambiguation is the use of decision list algorithm which is a supervised method. Supervised methods are considered as the most accurate machine learning algorithms but they are strongly influenced by knowledge acquisition bottleneck which means that their efficiency depends on the size of the tagged training set, in which their preparation is difficult, time-consuming and costly. The proposed method in this article improves the efficiency of this algorithm where there is a small tagged training set. This method uses a statistical method for collocation extraction from a big untagged corpus. Thus, the more important collocations which are the features used for creation of learning hypotheses will be identified. Weighting the features improves the efficiency and accuracy of a decision list algorithm which has been trained with a small training corpus.展开更多
Word sense disambiguation(WSD),identifying the specific sense of the target word given its context,is a fundamental task in natural language processing.Recently,researchers have shown promising results using long shor...Word sense disambiguation(WSD),identifying the specific sense of the target word given its context,is a fundamental task in natural language processing.Recently,researchers have shown promising results using long short term memory(LSTM),which is able to better capture sequential and syntactic features of text.However,this method neglects the dependencies among instances,such as their context semantic similarities.To solve this problem,we proposed a novel WSD model by introducing a cache-like memory module to capture the semantic dependencies among instances for WSD.Extensive evaluations on standard datasets demonstrate the superiority of the proposed model over various baselines.展开更多
文摘Word Sense Disambiguation has been a trending topic of research in Natural Language Processing and Machine Learning.Mining core features and performing the text classification still exist as a challenging task.Here the features of the context such as neighboring words like adjective provide the evidence for classification using machine learning approach.This paper presented the text document classification that has wide applications in information retrieval,which uses movie review datasets.Here the document indexing based on controlled vocabulary,adjective,word sense disambiguation,generating hierarchical cate-gorization of web pages,spam detection,topic labeling,web search,document summarization,etc.Here the kernel support vector machine learning algorithm helps to classify the text and feature extract is performed by cuckoo search opti-mization.Positive review and negative review of movie dataset is presented to get the better classification accuracy.Experimental results focused with context mining,feature analysis and classification.By comparing with the previous work,proposed work designed to achieve the efficient results.Overall design is per-formed with MATLAB 2020a tool.
文摘All human languages have words that can mean different things in different contexts, such words with multiple meanings are potentially “ambiguous”. The process of “deciding which of several meanings of a term is intended in a given context” is known as “word sense disambiguation (WSD)”. This paper presents a method of WSD that assigns a target word the sense that is most related to the senses of its neighbor words. We explore the use of measures of relatedness between word senses based on a novel hybrid approach. First, we investigate how to “literally” and “regularly” express a “concept”. We apply set algebra to WordNet’s synsets cooperating with WordNet’s word ontology. In this way we establish regular rules for constructing various representations (lexical notations) of a concept using Boolean operators and word forms in various synset(s) defined in WordNet. Then we establish a formal mechanism for quantifying and estimating the semantic relatedness between concepts—we facilitate “concept distribution statistics” to determine the degree of semantic relatedness between two lexically expressed con- cepts. The experimental results showed good performance on Semcor, a subset of Brown corpus. We observe that measures of semantic relatedness are useful sources of information for WSD.
文摘This paper presents a new approach to determining whether an interested personal name across doeuments refers to the same entity. Firstly,three vectors for each text are formed: the personal name Boolean vectors denoting whether a personal name occurs the text the biographical word Boolean vector representing title, occupation and so forth, and the feature vector with real values. Then, by combining a heuristic strategy based on Boolean vectors with an agglomeratie clustering algorithm based on feature vectors, it seeks to resolve multi-document personal name coreference. Experimental results show that this approach achieves a good performance by testing on "Wang Gang" corpus.
文摘It’s common that different individuals share the same name, which makes it time-consuming to search information of a particular individual on the web. Name disambiguation study is necessary to help users find the person of interest more readily. In this paper, we propose an Adaptive Resonance Theory (ART) based two-stage strategy for this problem. We get a first-stage clustering result with ART1 model and then merge similar clusters in the second stage. Our strategy is a mimic process of manual disambiguation and need not to predict the number of clusters, which makes it competent for the disambiguation task. Experimental results show that, in comparison with the agglomerative clustering method, our strategy improves the performance by respectively 0.92% and 5.00% on two kinds of name recognition results.
基金The research work is supported by National Key R&D Program of China under Grant No.2018YFC0831704National Nature Science Foundation of China under Grant No.61502259+1 种基金Natural Science Foundation of Shandong Province under Grant No.ZR2017MF056Taishan Scholar Program of Shandong Province in China(Directed by Prof.Yinglong Wang).
文摘Word sense disambiguation(WSD)is a fundamental but significant task in natural language processing,which directly affects the performance of upper applications.However,WSD is very challenging due to the problem of knowledge bottleneck,i.e.,it is hard to acquire abundant disambiguation knowledge,especially in Chinese.To solve this problem,this paper proposes a graph-based Chinese WSD method with multi-knowledge integration.Particularly,a graph model combining various Chinese and English knowledge resources by word sense mapping is designed.Firstly,the content words in a Chinese ambiguous sentence are extracted and mapped to English words with BabelNet.Then,English word similarity is computed based on English word embeddings and knowledge base.Chinese word similarity is evaluated with Chinese word embedding and HowNet,respectively.The weights of the three kinds of word similarity are optimized with simulated annealing algorithm so as to obtain their overall similarities,which are utilized to construct a disambiguation graph.The graph scoring algorithm evaluates the importance of each word sense node and judge the right senses of the ambiguous words.Extensive experimental results on SemEval dataset show that our proposed WSD method significantly outperforms the baselines.
基金This work was supported in part by the National Science Foundation of China(No.61772539,6187212,61972405),STITSX(No.201705D131025),1331KITSX,and CiCi3D.
文摘Stereolithographic(STL)files have been extensively used in rapid prototyping industries as well as many other fields as watermarking algorithms to secure intellectual property and protect three-dimensional models from theft.However,to the best of our knowledge,few studies have looked at how watermarking can resist attacks that involve vertex-reordering.Here,we present a lossless and robust watermarking scheme for STL files to protect against vertexreordering attacks.Specifically,we designed a novel error-correcting code(ECC)that can correct the error of any one-bit in a bitstream by inserting several check digits.In addition,ECC is designed to make use of redundant information according to the characteristics of STL files,which introduces further robustness for defense against attacks.No modifications are made to the geometric information of the three-dimensional model,which respects the requirements of a highprecision model.The experimental results show that the proposed watermarking scheme can survive numerous kinds of attack,including rotation,scaling and translation(RST),facet reordering,and vertex-reordering attacks.
基金Supported by the National Natural Science Foundation of China (No.60435020).
文摘Word Sense Disambiguation (WSD) is to decide the sense of an ambiguous word on particular context. Most of current studies on WSD only use several ambiguous words as test samples, thus leads to some limitation in practical application. In this paper, we perform WSD study based on large scale real-world corpus using two unsupervised learning algorithms based on ±n-improved Bayesian model and Dependency Grammar (DG)-improved Bayesian model. ±n-improved classifiers reduce the window size of context of ambiguous words with close-distance feature extraction method, and decrease the jamming of useless features, thus obviously improve the accuracy, reaching 83.18% (in open test). DG-improved classifier can more effectively conquer the noise effect existing in Naive-Bayesian classifier. Experimental results show that this approach does better on Chinese WSD, and the open test achieved an accuracy of 86.27%.
文摘A sense feature system (SFS) is first automatically constructed from the text corpora to structurize the textural information. WSD rules are then extracted from SFS according to their certainty factors and are applied to disambiguate the senses of polysemous words. The entropy of a deterministic rough prediction is used to measure the decision quality of a rule set. Finally, the back off rule smoothing method is further designed to improve the performance of a WSD model. In the experiments, a mean rate of correction achieved during experiments for WSD in the case of rule smoothing is 0.92.
文摘The natural language processing has a set of phases that evolves from lexical text analysis to the pragmatic one in which the author’s intentions are shown. The ambiguity problem appears in all of these tasks. Previous works tries to do word sense disambiguation, the process of assign a sense to a word inside a specific context, creating algorithms under a supervised or unsupervised approach, which means that those algorithms use or not an external lexical resource. This paper presents an approximated approach that combines not supervised algorithms by the use of a classifiers set, the result will be a learning algorithm based on unsupervised methods for word sense disambiguation process. It begins with an introduction to word sense disambiguation concepts and then analyzes some unsupervised algorithms in order to extract the best of them, and combines them under a supervised approach making use of some classifiers.
文摘An improved name disambiguation method based on atom cluster. Aiming at the method of character-related properties of similarity based on information extraction depends on the character information, a new name disambiguation method is proposed, and improved k-means algorism for name disambiguation is proposed in this paper. The cluster analysis cluster is introduced to the name disambiguation process. Experiment results show that the proposed method having the high implementation efficiency and can distinguish the different people with the same name.
基金Supported by Trans-century Training Program Foundation for the Talents by the State Education Commission
文摘This paper proved the statement that a good linear block encoder is in fact a good local-random sequence generator. Furthermore, this statement discovers the deep relationship between the error-correcting coding theory and the modern cryptography.
文摘Every term has a meaning but there are terms which have multiple meanings. Identifying the correct meaning of a term in a specific context is the goal of Word Sense Disambiguation (WSD) applications. Identifying the correct sense of a term given a limited context is even harder. This research aims at solving the problem of identifying the correct sense of a term given only one term as its context. The main focus of this research is on using Wikipedia as the external knowledge source to decipher the true meaning of each term using a single term as the context. We experimented with the semantically rich Wikipedia senses and hyperlinks for context disambiguation. We also analyzed the effect of sense filtering on context extraction and found it quite effective for contextual disambiguation. Results have shown that disambiguation with filtering works quite well on manually disambiguated dataset with the performance accuracy of 86%.
文摘A name disambiguation method is proposed based on attribute match and link analysis applying in the field of insurance. Aiming at the former name disambiguation methods such as text clustering method needs to be considered in a lot of useless words, a new name disambiguation method is advanced. Firstly, the same attribute matching is applied, merging the identity of a successful match, secondly, the link analysis is used, structural analysis of customers network is analyzed, Finally, the same cooperating information is merged. Experiment results show that the proposed method can realize name disambiguation successfully.
文摘The input of a network is the key problem for Chinese word sense disambiguation utilizing the neural network. This paper presents an input model of the neural network that calculates the mutual information between contextual words and the ambiguous word by using statistical methodology and taking the contextual words of a certain number beside the ambiguous word according to (-M,+N).The experiment adopts triple-layer BP Neural Network model and proves how the size of a training set and the value of Mand Naffect the performance of the Neural Network Model. The experimental objects are six pseudowords owning three word-senses constructed according to certain principles. The tested accuracy of our approach on a closed-corpus reaches 90.31%, and 89.62% on an open-corpus. The experiment proves that the Neural Network Model has a good performance on Word Sense Disambiguation.
基金Project supported by the National Natural Science Foundation of China (Grant No. 61072071)
文摘Quantum secret sharing(QSS) is a procedure of sharing classical information or quantum information by using quantum states. This paper presents how to use a [2k- 1, 1, k] quantum error-correcting code (QECC) to implement a quantum (k, 2k-1) threshold scheme. It also takes advantage of classical enhancement of the [2k-1, 1, k] QECC to establish a QSS scheme which can share classical information and quantum information simultaneously. Because information is encoded into QECC, these schemes can prevent intercept-resend attacks and be implemented on some noisy channels.
文摘Word sense disambiguation is used in many natural language processing fields. One of the ways of disambiguation is the use of decision list algorithm which is a supervised method. Supervised methods are considered as the most accurate machine learning algorithms but they are strongly influenced by knowledge acquisition bottleneck which means that their efficiency depends on the size of the tagged training set, in which their preparation is difficult, time-consuming and costly. The proposed method in this article improves the efficiency of this algorithm where there is a small tagged training set. This method uses a statistical method for collocation extraction from a big untagged corpus. Thus, the more important collocations which are the features used for creation of learning hypotheses will be identified. Weighting the features improves the efficiency and accuracy of a decision list algorithm which has been trained with a small training corpus.
文摘Word sense disambiguation(WSD),identifying the specific sense of the target word given its context,is a fundamental task in natural language processing.Recently,researchers have shown promising results using long short term memory(LSTM),which is able to better capture sequential and syntactic features of text.However,this method neglects the dependencies among instances,such as their context semantic similarities.To solve this problem,we proposed a novel WSD model by introducing a cache-like memory module to capture the semantic dependencies among instances for WSD.Extensive evaluations on standard datasets demonstrate the superiority of the proposed model over various baselines.