Novelty detection is to retrieve new information and filter redundancy fromgiven sentences that are relevant to a specific topic. In TREC2003, the authors tried an approach tonovelty detection with semantic distance c...Novelty detection is to retrieve new information and filter redundancy fromgiven sentences that are relevant to a specific topic. In TREC2003, the authors tried an approach tonovelty detection with semantic distance computation. The motivation is to expand a sentence byintroducing semantic information. Computation on semantic distance between sentences incorporatesWordNet with statistical information. The novelty detection is treated as a binary classificationproblem: new sentence or not. The feature vector, used in the vector space model for classification,consists of various factors, including the semantic distance from the sentence to the topic and thedistance from the sentence to the previous relevant context occurring before it. New sentences arethen detected with Winnow and support vector machine classifiers, respectively. Several experimentsare conducted to survey the relationship between different factors and performance. It is provedthat semantic computation is promising in novelty detection. The ratio of new sentence size torelevant size is further studied given different relevant document sizes. It is found that the ratioreduced with a certain speed (about 0.86). Then another group of experiments is performedsupervised with the ratio. It is demonstrated that the ratio is helpful to improve the noveltydetection performance.展开更多
To solve the bottleneck problem in centralized service discovery methods,a novel architecture based on domain ontology for semantic service discovery is proposed.This distributed architecture can adjust the domain par...To solve the bottleneck problem in centralized service discovery methods,a novel architecture based on domain ontology for semantic service discovery is proposed.This distributed architecture can adjust the domain partition and allocate system resources automatically.The characteristics of this mechanism are analyzed,including scalability,self-organization and adaptability.In this mechanism,semantic web service discovery is separated into two parts.First,under balance tree topology,registry proxy can rapidly forward requests to the objective registry center,and avoid the bottleneck problem.Secondly,a semantic distance based service matching algorithm is proposed to promote the effect of service searching.The results of simulation experiments show that the proposed mechanism can serve as a scalable solution for semantic web service publication and discovery.And the improved matching algorithm has higher recall and precision than other algorithms.展开更多
Semantic Web Services is an emerging technology that promises to enable dynamic, execution-time discovery, composition, and invocation of Web Services. Semantic matchmaking plays a vital role in the automated and dyna...Semantic Web Services is an emerging technology that promises to enable dynamic, execution-time discovery, composition, and invocation of Web Services. Semantic matchmaking plays a vital role in the automated and dynamic discovery process of Semantic Web Services and consists in measuring the semantic distance between a requested service and an advertised one. In this paper, an innovative approach to effectively compute the semantic distance between Ontology Web Language for Services (OWL-S) annotated services is proposed. First, an edge-based method for measuring the semantic distance between Web Ontology Language (OWL) concepts is presented. Then, a comparison of the proposed measure and the one presented in a recent related work is made in order to show that our method is more efficient and fine-grained. Finally, some equations to compute semantic matchmaking of service capabilities, which are expressed in terms of inputs and outputs, are presented.展开更多
This paper is concerned with the matchmaker for ranking web services by using semantics. So far several methods of semantic matchmaker have been proposed. Most of them, however, focus on classifying the services into ...This paper is concerned with the matchmaker for ranking web services by using semantics. So far several methods of semantic matchmaker have been proposed. Most of them, however, focus on classifying the services into predefined categories rather than providing a ranking result. In this paper, a new method of semantic matchmaker is proposed for ranking web services. It is proposed to use the semantic distance for estimating the matching degree between a service and a user request. Four types of semantic distances are defined and four algorithms are implemented respectively to calculate them. Experimental results show that the proposed semantic matchmaker significantly outperforms the keywordbased baseline method.展开更多
Without explicit description of map application themes,it is difficult for users to discover desired map resources from massive online Web Map Services(WMS).However,metadata-based map application theme extraction is a...Without explicit description of map application themes,it is difficult for users to discover desired map resources from massive online Web Map Services(WMS).However,metadata-based map application theme extraction is a challenging multi-label text classification task due to limited training samples,mixed vocabularies,variable length and content arbitrariness of text fields.In this paper,we propose a novel multi-label text classification method,Text GCN-SW-KNN,based on geographic semantics and collaborative training to improve classifica-tion accuracy.The semi-supervised collaborative training adopts two base models,i.e.a modified Text Graph Convolutional Network(Text GCN)by utilizing Semantic Web,named Text GCN-SW,and widely-used Multi-Label K-Nearest Neighbor(ML-KNN).Text GCN-SW is improved from Text GCN by adjusting the adjacency matrix of the heterogeneous word document graph with the shortest semantic distances between themes and words in metadata text.The distances are calculated with the Semantic Web of Earth and Environmental Terminology(SWEET)and WordNet dictionaries.Experiments on both the WMS and layer metadata show that the proposed methods can achieve higher F1-score and accuracy than state-of-the-art baselines,and demonstrate better stability in repeating experiments and robustness to less training data.Text GCN-SW-KNN can be extended to other multi-label text classification scenario for better supporting metadata enhancement and geospatial resource discovery in Earth Science domain.展开更多
Although many linguistic theories and formalisms have been developed and discussed on the problem of parsing algorithm in the past decades, the efficiency and accuracy of parsing are still serious problems in practica...Although many linguistic theories and formalisms have been developed and discussed on the problem of parsing algorithm in the past decades, the efficiency and accuracy of parsing are still serious problems in practical machine translation systems. This paper presents a parsing algorithm with dynamic rule selection and the experimental results. By describing the design and practice of the improved algorithm, the paper discusses in detail the designing method for a high speed and efficient parser.展开更多
In this paper we propose a multiple feature approach for the normalization task which can map each disorder mention in the text to a unique unified medical language system(UMLS)concept unique identifier(CUI). We d...In this paper we propose a multiple feature approach for the normalization task which can map each disorder mention in the text to a unique unified medical language system(UMLS)concept unique identifier(CUI). We develop a two-step method to acquire a list of candidate CUIs and their associated preferred names using UMLS API and to choose the closest CUI by calculating the similarity between the input disorder mention and each candidate. The similarity calculation step is formulated as a classification problem and multiple features(string features,ranking features,similarity features,and contextual features) are used to normalize the disorder mentions. The results show that the multiple feature approach improves the accuracy of the normalization task from 32.99% to 67.08% compared with the Meta Map baseline.展开更多
文摘Novelty detection is to retrieve new information and filter redundancy fromgiven sentences that are relevant to a specific topic. In TREC2003, the authors tried an approach tonovelty detection with semantic distance computation. The motivation is to expand a sentence byintroducing semantic information. Computation on semantic distance between sentences incorporatesWordNet with statistical information. The novelty detection is treated as a binary classificationproblem: new sentence or not. The feature vector, used in the vector space model for classification,consists of various factors, including the semantic distance from the sentence to the topic and thedistance from the sentence to the previous relevant context occurring before it. New sentences arethen detected with Winnow and support vector machine classifiers, respectively. Several experimentsare conducted to survey the relationship between different factors and performance. It is provedthat semantic computation is promising in novelty detection. The ratio of new sentence size torelevant size is further studied given different relevant document sizes. It is found that the ratioreduced with a certain speed (about 0.86). Then another group of experiments is performedsupervised with the ratio. It is demonstrated that the ratio is helpful to improve the noveltydetection performance.
基金The National Basic Research Program of China(973 Program)(No.2010CB328104,2009CB320501)the National Natural Science Foundation of China(No.61070161,61070158,61003257, 61003311)+2 种基金the National Key Technology R&D Program during the 11th Five-Year Plan Period(No.2010BAI88B03)the Foundation of Jiangsu Provincial Key Laboratory of Netw ork and Information Security (No.BM2003201)Open Research Fund from Key Laboratory of Computer Netw ork and Information Integration of Ministry of Education (Southeast University)
文摘To solve the bottleneck problem in centralized service discovery methods,a novel architecture based on domain ontology for semantic service discovery is proposed.This distributed architecture can adjust the domain partition and allocate system resources automatically.The characteristics of this mechanism are analyzed,including scalability,self-organization and adaptability.In this mechanism,semantic web service discovery is separated into two parts.First,under balance tree topology,registry proxy can rapidly forward requests to the objective registry center,and avoid the bottleneck problem.Secondly,a semantic distance based service matching algorithm is proposed to promote the effect of service searching.The results of simulation experiments show that the proposed mechanism can serve as a scalable solution for semantic web service publication and discovery.And the improved matching algorithm has higher recall and precision than other algorithms.
文摘Semantic Web Services is an emerging technology that promises to enable dynamic, execution-time discovery, composition, and invocation of Web Services. Semantic matchmaking plays a vital role in the automated and dynamic discovery process of Semantic Web Services and consists in measuring the semantic distance between a requested service and an advertised one. In this paper, an innovative approach to effectively compute the semantic distance between Ontology Web Language for Services (OWL-S) annotated services is proposed. First, an edge-based method for measuring the semantic distance between Web Ontology Language (OWL) concepts is presented. Then, a comparison of the proposed measure and the one presented in a recent related work is made in order to show that our method is more efficient and fine-grained. Finally, some equations to compute semantic matchmaking of service capabilities, which are expressed in terms of inputs and outputs, are presented.
基金This work is supported by the National Natural Science Foundation of China under Grant No. 90604025.
文摘This paper is concerned with the matchmaker for ranking web services by using semantics. So far several methods of semantic matchmaker have been proposed. Most of them, however, focus on classifying the services into predefined categories rather than providing a ranking result. In this paper, a new method of semantic matchmaker is proposed for ranking web services. It is proposed to use the semantic distance for estimating the matching degree between a service and a user request. Four types of semantic distances are defined and four algorithms are implemented respectively to calculate them. Experimental results show that the proposed semantic matchmaker significantly outperforms the keywordbased baseline method.
基金supported by National Natural Science Foundation of China[No.41971349,No.41930107,No.42090010 and No.41501434]National Key Research and Development Program of China[No.2017YFB0503704 and No.2018YFC0809806].
文摘Without explicit description of map application themes,it is difficult for users to discover desired map resources from massive online Web Map Services(WMS).However,metadata-based map application theme extraction is a challenging multi-label text classification task due to limited training samples,mixed vocabularies,variable length and content arbitrariness of text fields.In this paper,we propose a novel multi-label text classification method,Text GCN-SW-KNN,based on geographic semantics and collaborative training to improve classifica-tion accuracy.The semi-supervised collaborative training adopts two base models,i.e.a modified Text Graph Convolutional Network(Text GCN)by utilizing Semantic Web,named Text GCN-SW,and widely-used Multi-Label K-Nearest Neighbor(ML-KNN).Text GCN-SW is improved from Text GCN by adjusting the adjacency matrix of the heterogeneous word document graph with the shortest semantic distances between themes and words in metadata text.The distances are calculated with the Semantic Web of Earth and Environmental Terminology(SWEET)and WordNet dictionaries.Experiments on both the WMS and layer metadata show that the proposed methods can achieve higher F1-score and accuracy than state-of-the-art baselines,and demonstrate better stability in repeating experiments and robustness to less training data.Text GCN-SW-KNN can be extended to other multi-label text classification scenario for better supporting metadata enhancement and geospatial resource discovery in Earth Science domain.
文摘Although many linguistic theories and formalisms have been developed and discussed on the problem of parsing algorithm in the past decades, the efficiency and accuracy of parsing are still serious problems in practical machine translation systems. This paper presents a parsing algorithm with dynamic rule selection and the experimental results. By describing the design and practice of the improved algorithm, the paper discusses in detail the designing method for a high speed and efficient parser.
基金Supported by the National Natural Science Foundation of China(61133012,61202193,61373108)the Major Projects of the National Social Science Foundation of China(11&ZD189)+1 种基金the Chinese Postdoctoral Science Foundation(2013M540593,2014T70722)the Open Foundation of Shandong Key Laboratory of Language Resource Development and Application
文摘In this paper we propose a multiple feature approach for the normalization task which can map each disorder mention in the text to a unique unified medical language system(UMLS)concept unique identifier(CUI). We develop a two-step method to acquire a list of candidate CUIs and their associated preferred names using UMLS API and to choose the closest CUI by calculating the similarity between the input disorder mention and each candidate. The similarity calculation step is formulated as a classification problem and multiple features(string features,ranking features,similarity features,and contextual features) are used to normalize the disorder mentions. The results show that the multiple feature approach improves the accuracy of the normalization task from 32.99% to 67.08% compared with the Meta Map baseline.