In order to achieve adaptive and efficient service composition, a task-oriented algorithm for discovering services is proposed. The traditional process of service composition is divided into semantic discovery and fun...In order to achieve adaptive and efficient service composition, a task-oriented algorithm for discovering services is proposed. The traditional process of service composition is divided into semantic discovery and functional matching and makes tasks be operation objects. Semantic similarity is used to discover services matching a specific task and then generate a corresponding task-oriented web service composition (TWC) graph. Moreover, an algorithm for the new service is designed to update the TWC. The approach is applied to the composition model, in which the TWC is searched to obtain an optimal path and the final service composition is output. Also, the model can implement realtime updating with changing environments. Experimental results demonstrate the feasibility and effectiveness of the algorithm and indicate that the maximum searching radius can be set to 2 to achieve an equilibrium point of quality and quantity.展开更多
To solve the problem of the inadequacy of semantic processing in the intelligent question answering system, an integrated semantic similarity model which calculates the semantic similarity using the geometric distance...To solve the problem of the inadequacy of semantic processing in the intelligent question answering system, an integrated semantic similarity model which calculates the semantic similarity using the geometric distance and information content is presented in this paper. With the help of interrelationship between concepts, the information content of concepts and the strength of the edges in the ontology network, we can calculate the semantic similarity between two concepts and provide information for the further calculation of the semantic similarity between user’s question and answers in knowledge base. The results of the experiments on the prototype have shown that the semantic problem in natural language processing can also be solved with the help of the knowledge and the abundant semantic information in ontology. More than 90% accuracy with less than 50 ms average searching time in the intelligent question answering prototype system based on ontology has been reached. The result is very satisfied. Key words intelligent question answering system - ontology - semantic similarity - geometric distance - information content CLC number TP39 Foundation item: Supported by the important science and technology item of China of “The 10th Five-year Plan” (2001BA101A05-04)Biography: LIU Ya-jun (1953-), female, Associate professor, research direction: software engineering, information processing, data-base application.展开更多
Internet of Things (IoT) as an important and ubiquitous service paradigm is one of the most important issues in IoT applications to provide terminal users with effective and efficient services based on service communi...Internet of Things (IoT) as an important and ubiquitous service paradigm is one of the most important issues in IoT applications to provide terminal users with effective and efficient services based on service community. This paper presents a semantic-based similarity algorithm to build the IoT service community. Firstly, the algorithm reflects that the nodes of IoT contain a wealth of semantic information and makes them to build into the concept tree. Then tap the similarity of the semantic information based on the concept tree. Finally, we achieve the optimization of the service community through greedy algorithm and control the size of the service community by adjusting the threshold. Simulation results show the effectiveness and feasibility of this algorithm.展开更多
Most of the questions from users lack the context needed to thoroughly understand the problemat hand,thus making the questions impossible to answer.Semantic Similarity Estimation is based on relating user’s questions...Most of the questions from users lack the context needed to thoroughly understand the problemat hand,thus making the questions impossible to answer.Semantic Similarity Estimation is based on relating user’s questions to the context from previous Conversational Search Systems(CSS)to provide answers without requesting the user’s context.It imposes constraints on the time needed to produce an answer for the user.The proposed model enables the use of contextual data associated with previous Conversational Searches(CS).While receiving a question in a new conversational search,the model determines the question that refers tomore pastCS.Themodel then infers past contextual data related to the given question and predicts an answer based on the context inferred without engaging in multi-turn interactions or requesting additional data from the user for context.This model shows the ability to use the limited information in user queries for best context inferences based on Closed-Domain-based CS and Bidirectional Encoder Representations from Transformers for textual representations.展开更多
Ant-based text clustering is a promising technique that has attracted great research attention. This paper attempts to improve the standard ant-based text-clustering algorithm in two dimensions. On one hand, the ontol...Ant-based text clustering is a promising technique that has attracted great research attention. This paper attempts to improve the standard ant-based text-clustering algorithm in two dimensions. On one hand, the ontology-based semantic similarity measure is used in conjunction with the traditional vector-space-model-based measure to provide more accurate assessment of the similarity between documents. On the other, the ant behavior model is modified to pursue better algorithmic performance. Especially, the ant movement rule is adjusted so as to direct a laden ant toward a dense area of the same type of items as the ant's carrying item, and to direct an unladen ant toward an area that contains an item dissimilar with the surrounding items within its Moore neighborhood. Using WordNet as the base ontology for assessing the semantic similarity between documents, the proposed algorithm is tested with a sample set of documents excerpted from the Reuters-21578 corpus and the experiment results partly indicate that the proposed algorithm perform better than the standard ant-based text-clustering algorithm and the k-means algorithm.展开更多
In order to improve the efficiency and quality of service composition,a service composition algorithm based on semantic constraint is proposed.First, a user’s requirements and services from a service repository are c...In order to improve the efficiency and quality of service composition,a service composition algorithm based on semantic constraint is proposed.First, a user’s requirements and services from a service repository are compared with the help of a matching algorithm.The algorithm has two levels and filters out the services which do not match the user’s constraint personality requirements.The mechanism can reduce the searching scope at the beginning of the service composition algorithm.Secondly,satisfactions of those selected services for the user’s personality requirements are computed and those services,which have the greatest satisfaction value to make up the service composition,are used.The algorithm is evaluated analytically and experimentally based on the efficiency of service composition and satisfaction for the user’s personality requirements.展开更多
In Chinese question answering system, because there is more semantic relation in questions than that in query words, the precision can be improved by expanding query while using natural language questions to retrieve ...In Chinese question answering system, because there is more semantic relation in questions than that in query words, the precision can be improved by expanding query while using natural language questions to retrieve documents. This paper proposes a new approach to query expansion based on semantics and statistics Firstly automatic relevance feedback method is used to generate a candidate expansion word set. Then the expanded query words are selected from the set based on the semantic similarity and seman- tic relevancy between the candidate words and the original words. Experiments show the new approach is effective for Web retrieval and out-performs the conventional expansion approaches.展开更多
Network security policy and the automated refinement of its hierarchies aims to simplify the administration of security services in complex network environments. The semantic gap between the policy hierarchies reflect...Network security policy and the automated refinement of its hierarchies aims to simplify the administration of security services in complex network environments. The semantic gap between the policy hierarchies reflects the validity of the policy hierarchies yielded by the automated policy refinement process. However, little attention has been paid to the evaluation of the compliance between the derived lower level policy and the higher level policy. We present an ontology based on Ontology Web Language (OWL) to describe the semantics of security policy and their implementation. We also propose a method of estimating the semantic similarity between a given展开更多
A reputation mechanism is introduced in P2P- based Semantic Web to solve the problem of lacking trust. It enables Semantic Web to utilize reputation information based on semantic similarity of peers in the network. Th...A reputation mechanism is introduced in P2P- based Semantic Web to solve the problem of lacking trust. It enables Semantic Web to utilize reputation information based on semantic similarity of peers in the network. This approach is evaluated in a simulation of a content sharing system and the experiments show that the system with reputation mechanism outperforms the system without it.展开更多
During the new product development process, reusing the existing CAD models could avoid designing from scratch and decrease human cost. With the advent of big data,how to rapidly and efficiently find out suitable 3D C...During the new product development process, reusing the existing CAD models could avoid designing from scratch and decrease human cost. With the advent of big data,how to rapidly and efficiently find out suitable 3D CAD models for design reuse is taken more attention. Currently the sketch-based retrieval approach makes search more convenient, but its accuracy is not high enough; on the other hand, the semantic-based retrieval approach fully utilizes high level semantic information, and makes search much closer to engineers' intent.However, effectively extracting and representing semantic information from data sets is difficult.Aiming at these problems, we proposed a sketch-based semantic retrieval approach for reusing3 D CAD models. Firstly a fine granularity semantic descriptor is designed for representing 3D CAD models; Secondly, several heuristic rules are adopted to recognize 3D features from 2D sketch, and the correspondences between 3D feature and 2D loops are built; Finally, semantic and shape similarity measurements are combined together to match the input sketch to 3D CAD models. Hence the retrieval accuracy is improved. A sketch-based prototype system is developed.Experimental results validate the feasibility and effectiveness of our proposed approach.展开更多
Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the sema...Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the semantic similarity of short texts. Document-level semantic measurement remains an open issue due to problems such as the omission of background knowledge and topic transition. In this paper, we propose a novel semantic matching method for long documents in the academic domain. To accurately represent the general meaning of an academic article, we construct a semantic profile in which key semantic elements such as the research purpose, methodology, and domain are included and enriched. As such, we can obtain the overall semantic similarity of two papers by computing the distance between their profiles. The distances between the concepts of two different semantic profiles are measured by word vectors. To improve the semantic representation quality of word vectors, we propose a joint word-embedding model for incorporating a domain-specific semantic relation constraint into the traditional context constraint. Our experimental results demonstrate that, in the measurement of document semantic similarity, our approach achieves substantial improvement over state-of-the-art methods, and our joint word-embedding model produces significantly better word representations than traditional word-embedding models.展开更多
In this paper, an improved algorithm, web-based keyword weight algorithm (WKWA), is presented to weight keywords in web documents. WKWA takes into account representation features of web documents and advantages of t...In this paper, an improved algorithm, web-based keyword weight algorithm (WKWA), is presented to weight keywords in web documents. WKWA takes into account representation features of web documents and advantages of the TF*IDF, TFC and ITC algorithms in order to make it more appropriate for web documents. Meanwhile, the presented algorithm is applied to improved vector space model (IVSM). A real system has been implemented for calculating semantic similarities of web documents. Four experiments have been carried out. They are keyword weight calculation, feature item selection, semantic similarity calculation, and WKWA time performance. The results demonstrate accuracy of keyword weight, and semantic similarity is improved.展开更多
Various fishery information systems have been developed in different times and on different platforms. Wet) service application composition is crucial in the sharing and integration of fishery data and information. I...Various fishery information systems have been developed in different times and on different platforms. Wet) service application composition is crucial in the sharing and integration of fishery data and information. In the present paper, a heuristic web service composition method based on fishery ontology is presented, and the proposed web services are described. Ontology reasoning capability was applied to generate a service composition graph. The heuristic function was introduced to reduce the searching space. The experimental results show that the algorithm used considers the services semantic similarity and adiusts web service composition plan by dynamically relying on the empirical data.展开更多
Purpose: The purpose of this study is to develop an automated frequently asked question(FAQ) answering system for farmers. This paper presents an approach for calculating the similarity between Chinese sentences based...Purpose: The purpose of this study is to develop an automated frequently asked question(FAQ) answering system for farmers. This paper presents an approach for calculating the similarity between Chinese sentences based on hybrid strategies.Design/methodology/approach: We analyzed the factors influencing the successful matching between a user's question and a question-answer(QA) pair in the FAQ database. Our approach is based on a combination of multiple factors. Experiments were conducted to test the performance of our method.Findings: Experiments show that this proposed method has higher accuracy. Compared with similarity calculation based on TF-IDF,the sentence surface forms and the semantic relations,the proposed method based on hybrid strategies has a superior performance in precision,recall and F-measure value.Research limitations: The FAQ answering system is only capable of meeting users' demand for text retrieval at present. In the future,the system needs to be improved to meet users' demand for retrieving images and videos.Practical implications: This FAQ answering system will help farmers utilize agricultural information resources more efficiently.Originality/value: We design the algorithms for calculating similarity of Chinese sentences based on hybrid strategies,which integrate the question surface similarity,the question semantic similarity and the question-answer similarity based on latent semantic analysis(LSA) to find answers to a user's question.展开更多
Student mobility or academic mobility involves students moving between institutions during their post-secondary education,and one of the challenging tasks in this process is to assess the transfer credits to be offere...Student mobility or academic mobility involves students moving between institutions during their post-secondary education,and one of the challenging tasks in this process is to assess the transfer credits to be offered to the incoming student.In general,this process involves domain experts comparing the learning outcomes of the courses,to decide on offering transfer credits to the incoming students.This manual implementation is not only labor-intensive but also influenced by undue bias and administrative complexity.The proposed research article focuses on identifying a model that exploits the advancements in the field of Natural Language Processing(NLP)to effectively automate this process.Given the unique structure,domain specificity,and complexity of learning outcomes(LOs),a need for designing a tailor-made model arises.The proposed model uses a clustering-inspired methodology based on knowledge-based semantic similarity measures to assess the taxonomic similarity of LOs and a transformer-based semantic similarity model to assess the semantic similarity of the LOs.The similarity between LOs is further aggregated to form course to course similarity.Due to the lack of quality benchmark datasets,a new benchmark dataset containing seven course-to-course similarity measures is proposed.Understanding the inherent need for flexibility in the decision-making process the aggregation part of the model offers tunable parameters to accommodate different levels of leniency.While providing an efficient model to assess the similarity between courses with existing resources,this research work also steers future research attempts to apply NLP in the field of articulation in an ideal direction by highlighting the persisting research gaps.展开更多
Because of the anonymity and openness of E-commerce, the on-line transaction and the selection of network resources meet new challenges. For this reason, a trust domain-based multi-agent model for network resource sel...Because of the anonymity and openness of E-commerce, the on-line transaction and the selection of network resources meet new challenges. For this reason, a trust domain-based multi-agent model for network resource selection is presented. The model divides the network into numbers of trust domains and prevents the inconsistency of information maintained by different agents through the periodical communication among the agents. The model enables consumers to receive the response from the agents much quicker because the trust values of participators are evalUated and updated dynamically and timely after the completion of each transaction. In order to make users choose the best matching services and give users with trusted services, the model takes into account the similarity between services and the service providers' recognition to the services. Finally, the model illustrates the effectiveness and feasibility according to the experiment.展开更多
As the tsunami of data has emerged,search engines have become the most powerful tool for obtaining scattered information on the internet.The traditional search engines return the organized results by using ranking alg...As the tsunami of data has emerged,search engines have become the most powerful tool for obtaining scattered information on the internet.The traditional search engines return the organized results by using ranking algorithm such as term frequency,link analysis(PageRank algorithm and HITS algorithm)etc.However,these algorithms must combine the keyword frequency to determine the relevance between user’s query and the data in the computer system or internet.Moreover,we expect the search engines could understand users’searching by content meanings rather than literal strings.Semantic Web is an intelligent network and it could understand human’s language more semantically and make the communication easier between human and computers.But,the current technology for the semantic search is hard to apply.Because some meta data should be annotated to each web pages,then the search engine will have the ability to understand the users intend.However,annotate every web page is very time-consuming and leads to inefficiency.So,this study designed an ontology-based approach to improve the current traditional keyword-based search and emulate the effects of semantic search.And let the search engine can understand users more semantically when it gets the knowledge.展开更多
With the development of big data,all walks of life in society have begun to venture into big data to serve their own enterprises and departments.Big data has been embraced by university digital libraries.The most cumb...With the development of big data,all walks of life in society have begun to venture into big data to serve their own enterprises and departments.Big data has been embraced by university digital libraries.The most cumbersome work for the management of university libraries is document retrieval.This article uses Hadoop algorithm to extract semantic keywords and then calculates semantic similarity based on the literature retrieval keyword calculation process.The fast-matching method is used to determine the weight of each keyword,so as to ensure an efficient and accurate document retrieval in digital libraries,thus completing the design of the document retrieval method for university digital libraries based on Hadoop technology.展开更多
Nowadays,we can use the multi-task learning approach to train a machine-learning algorithm to learn multiple related tasks instead of training it to solve a single task.In this work,we propose an algorithm for estimat...Nowadays,we can use the multi-task learning approach to train a machine-learning algorithm to learn multiple related tasks instead of training it to solve a single task.In this work,we propose an algorithm for estimating textual similarity scores and then use these scores in multiple tasks such as text ranking,essay grading,and question answering systems.We used several vectorization schemes to represent the Arabic texts in the SemEval2017-task3-subtask-D dataset.The used schemes include lexical-based similarity features,frequency-based features,and pre-trained model-based features.Also,we used contextual-based embedding models such as Arabic Bidirectional Encoder Representations from Transformers(AraBERT).We used the AraBERT model in two different variants.First,as a feature extractor in addition to the text vectorization schemes’features.We fed those features to various regression models to make a prediction value that represents the relevancy score between Arabic text units.Second,AraBERT is adopted as a pre-trained model,and its parameters are fine-tuned to estimate the relevancy scores between Arabic textual sentences.To evaluate the research results,we conducted several experiments to compare the use of the AraBERT model in its two variants.In terms of Mean Absolute Percentage Error(MAPE),the results showminor variance between AraBERT v0.2 as a feature extractor(21.7723)and the fine-tuned AraBERT v2(21.8211).On the other hand,AraBERT v0.2-Large as a feature extractor outperforms the finetuned AraBERT v2 model on the used data set in terms of the coefficient of determination(R2)values(0.014050,−0.032861),respectively.展开更多
Based on the text orientation classification, a new measurement approach to semantic orientation of words was proposed. According to the integrated and detailed definition of words in HowNet, seed sets including the w...Based on the text orientation classification, a new measurement approach to semantic orientation of words was proposed. According to the integrated and detailed definition of words in HowNet, seed sets including the words with intense orientations were built up. The orientation similarity between the seed words and the given word was then calculated using the sentiment weight priority to recognize the semantic orientation of common words. Finally, the words' semantic orientation and the context were combined to recognize the given words' orientation. The experiments show that the measurement approach achieves better results for common words' orientation classification and contributes particularly to the text orientation classification of large granularities.展开更多
基金The National Key Technology R&D Program of Chinaduring the 11th Five-Year Plan Period(No2007BAF23B0302)the Major Research Plan of the National Natural Science Foundation of China(No90818028)
文摘In order to achieve adaptive and efficient service composition, a task-oriented algorithm for discovering services is proposed. The traditional process of service composition is divided into semantic discovery and functional matching and makes tasks be operation objects. Semantic similarity is used to discover services matching a specific task and then generate a corresponding task-oriented web service composition (TWC) graph. Moreover, an algorithm for the new service is designed to update the TWC. The approach is applied to the composition model, in which the TWC is searched to obtain an optimal path and the final service composition is output. Also, the model can implement realtime updating with changing environments. Experimental results demonstrate the feasibility and effectiveness of the algorithm and indicate that the maximum searching radius can be set to 2 to achieve an equilibrium point of quality and quantity.
文摘To solve the problem of the inadequacy of semantic processing in the intelligent question answering system, an integrated semantic similarity model which calculates the semantic similarity using the geometric distance and information content is presented in this paper. With the help of interrelationship between concepts, the information content of concepts and the strength of the edges in the ontology network, we can calculate the semantic similarity between two concepts and provide information for the further calculation of the semantic similarity between user’s question and answers in knowledge base. The results of the experiments on the prototype have shown that the semantic problem in natural language processing can also be solved with the help of the knowledge and the abundant semantic information in ontology. More than 90% accuracy with less than 50 ms average searching time in the intelligent question answering prototype system based on ontology has been reached. The result is very satisfied. Key words intelligent question answering system - ontology - semantic similarity - geometric distance - information content CLC number TP39 Foundation item: Supported by the important science and technology item of China of “The 10th Five-year Plan” (2001BA101A05-04)Biography: LIU Ya-jun (1953-), female, Associate professor, research direction: software engineering, information processing, data-base application.
基金Supported by the China Postdoctoral Science Foundation(No. 20100480701)the Ministry of Education of Humanities and Social Sciences Youth Fund Project(11YJC880119)
文摘Internet of Things (IoT) as an important and ubiquitous service paradigm is one of the most important issues in IoT applications to provide terminal users with effective and efficient services based on service community. This paper presents a semantic-based similarity algorithm to build the IoT service community. Firstly, the algorithm reflects that the nodes of IoT contain a wealth of semantic information and makes them to build into the concept tree. Then tap the similarity of the semantic information based on the concept tree. Finally, we achieve the optimization of the service community through greedy algorithm and control the size of the service community by adjusting the threshold. Simulation results show the effectiveness and feasibility of this algorithm.
文摘Most of the questions from users lack the context needed to thoroughly understand the problemat hand,thus making the questions impossible to answer.Semantic Similarity Estimation is based on relating user’s questions to the context from previous Conversational Search Systems(CSS)to provide answers without requesting the user’s context.It imposes constraints on the time needed to produce an answer for the user.The proposed model enables the use of contextual data associated with previous Conversational Searches(CS).While receiving a question in a new conversational search,the model determines the question that refers tomore pastCS.Themodel then infers past contextual data related to the given question and predicts an answer based on the context inferred without engaging in multi-turn interactions or requesting additional data from the user for context.This model shows the ability to use the limited information in user queries for best context inferences based on Closed-Domain-based CS and Bidirectional Encoder Representations from Transformers for textual representations.
基金This work was supported in part by National Natural Science Foundation of China under Grants No.70301009 and No. 70431001, and by Ministry of Education, Culture, Sports, Science and Technology of Japan under the "Kanazawa Region, Ishikawa High-Tech Sensing Cluster of Knowledge-Based Cluster Creation Project"
文摘Ant-based text clustering is a promising technique that has attracted great research attention. This paper attempts to improve the standard ant-based text-clustering algorithm in two dimensions. On one hand, the ontology-based semantic similarity measure is used in conjunction with the traditional vector-space-model-based measure to provide more accurate assessment of the similarity between documents. On the other, the ant behavior model is modified to pursue better algorithmic performance. Especially, the ant movement rule is adjusted so as to direct a laden ant toward a dense area of the same type of items as the ant's carrying item, and to direct an unladen ant toward an area that contains an item dissimilar with the surrounding items within its Moore neighborhood. Using WordNet as the base ontology for assessing the semantic similarity between documents, the proposed algorithm is tested with a sample set of documents excerpted from the Reuters-21578 corpus and the experiment results partly indicate that the proposed algorithm perform better than the standard ant-based text-clustering algorithm and the k-means algorithm.
基金The National Natural Science Foundation of China(No.60673130)the Natural Science Foundation of Shandong Province(No.Y2006G29,Y2007G24,Y2007G38)
文摘In order to improve the efficiency and quality of service composition,a service composition algorithm based on semantic constraint is proposed.First, a user’s requirements and services from a service repository are compared with the help of a matching algorithm.The algorithm has two levels and filters out the services which do not match the user’s constraint personality requirements.The mechanism can reduce the searching scope at the beginning of the service composition algorithm.Secondly,satisfactions of those selected services for the user’s personality requirements are computed and those services,which have the greatest satisfaction value to make up the service composition,are used.The algorithm is evaluated analytically and experimentally based on the efficiency of service composition and satisfaction for the user’s personality requirements.
基金the Specialized Research Program Fundthe Doctoral Program of Higher Education of China (20050007023)the Natural Science Foundation of Shandong Province(Y2004G04)
文摘In Chinese question answering system, because there is more semantic relation in questions than that in query words, the precision can be improved by expanding query while using natural language questions to retrieve documents. This paper proposes a new approach to query expansion based on semantics and statistics Firstly automatic relevance feedback method is used to generate a candidate expansion word set. Then the expanded query words are selected from the set based on the semantic similarity and seman- tic relevancy between the candidate words and the original words. Experiments show the new approach is effective for Web retrieval and out-performs the conventional expansion approaches.
基金the National Natural Science Foundation of China
文摘Network security policy and the automated refinement of its hierarchies aims to simplify the administration of security services in complex network environments. The semantic gap between the policy hierarchies reflects the validity of the policy hierarchies yielded by the automated policy refinement process. However, little attention has been paid to the evaluation of the compliance between the derived lower level policy and the higher level policy. We present an ontology based on Ontology Web Language (OWL) to describe the semantics of security policy and their implementation. We also propose a method of estimating the semantic similarity between a given
基金Supported by the National Natural Science Foun-dation of China (60173026) the Ministry of Education Key Project(105071) Foundation of E-Institute of Shanghai HighInstitutions(200301)
文摘A reputation mechanism is introduced in P2P- based Semantic Web to solve the problem of lacking trust. It enables Semantic Web to utilize reputation information based on semantic similarity of peers in the network. This approach is evaluated in a simulation of a content sharing system and the experiments show that the system with reputation mechanism outperforms the system without it.
基金Supported by the National Natural Science Foundation of China(61502129,61572432,61163016)the Zhejiang Natural Science Foundation of China(LQ16F020004,LQ15F020011)the University Scientific Research Projects of Ningxia Province of China(NGY2015161)
文摘During the new product development process, reusing the existing CAD models could avoid designing from scratch and decrease human cost. With the advent of big data,how to rapidly and efficiently find out suitable 3D CAD models for design reuse is taken more attention. Currently the sketch-based retrieval approach makes search more convenient, but its accuracy is not high enough; on the other hand, the semantic-based retrieval approach fully utilizes high level semantic information, and makes search much closer to engineers' intent.However, effectively extracting and representing semantic information from data sets is difficult.Aiming at these problems, we proposed a sketch-based semantic retrieval approach for reusing3 D CAD models. Firstly a fine granularity semantic descriptor is designed for representing 3D CAD models; Secondly, several heuristic rules are adopted to recognize 3D features from 2D sketch, and the correspondences between 3D feature and 2D loops are built; Finally, semantic and shape similarity measurements are combined together to match the input sketch to 3D CAD models. Hence the retrieval accuracy is improved. A sketch-based prototype system is developed.Experimental results validate the feasibility and effectiveness of our proposed approach.
基金supported by the Foundation of the State Key Laboratory of Software Development Environment(No.SKLSDE-2015ZX-04)
文摘Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the semantic similarity of short texts. Document-level semantic measurement remains an open issue due to problems such as the omission of background knowledge and topic transition. In this paper, we propose a novel semantic matching method for long documents in the academic domain. To accurately represent the general meaning of an academic article, we construct a semantic profile in which key semantic elements such as the research purpose, methodology, and domain are included and enriched. As such, we can obtain the overall semantic similarity of two papers by computing the distance between their profiles. The distances between the concepts of two different semantic profiles are measured by word vectors. To improve the semantic representation quality of word vectors, we propose a joint word-embedding model for incorporating a domain-specific semantic relation constraint into the traditional context constraint. Our experimental results demonstrate that, in the measurement of document semantic similarity, our approach achieves substantial improvement over state-of-the-art methods, and our joint word-embedding model produces significantly better word representations than traditional word-embedding models.
基金Project supported by the Science Foundation of Shanghai Municipal Commission of Science and Technology (Grant No.055115001)
文摘In this paper, an improved algorithm, web-based keyword weight algorithm (WKWA), is presented to weight keywords in web documents. WKWA takes into account representation features of web documents and advantages of the TF*IDF, TFC and ITC algorithms in order to make it more appropriate for web documents. Meanwhile, the presented algorithm is applied to improved vector space model (IVSM). A real system has been implemented for calculating semantic similarities of web documents. Four experiments have been carried out. They are keyword weight calculation, feature item selection, semantic similarity calculation, and WKWA time performance. The results demonstrate accuracy of keyword weight, and semantic similarity is improved.
基金funded by the National High Technology Research and Development Program of China (2006AA10Z239)the Key Technologies R&D Program of China during the 11th Five-Year Plan period (2006BAD10A05)
文摘Various fishery information systems have been developed in different times and on different platforms. Wet) service application composition is crucial in the sharing and integration of fishery data and information. In the present paper, a heuristic web service composition method based on fishery ontology is presented, and the proposed web services are described. Ontology reasoning capability was applied to generate a service composition graph. The heuristic function was introduced to reduce the searching space. The experimental results show that the algorithm used considers the services semantic similarity and adiusts web service composition plan by dynamically relying on the empirical data.
基金jointly supported by the National Social Science Foundation of China(Grant Nos.:08ATQ003 and 10&ZD134)
文摘Purpose: The purpose of this study is to develop an automated frequently asked question(FAQ) answering system for farmers. This paper presents an approach for calculating the similarity between Chinese sentences based on hybrid strategies.Design/methodology/approach: We analyzed the factors influencing the successful matching between a user's question and a question-answer(QA) pair in the FAQ database. Our approach is based on a combination of multiple factors. Experiments were conducted to test the performance of our method.Findings: Experiments show that this proposed method has higher accuracy. Compared with similarity calculation based on TF-IDF,the sentence surface forms and the semantic relations,the proposed method based on hybrid strategies has a superior performance in precision,recall and F-measure value.Research limitations: The FAQ answering system is only capable of meeting users' demand for text retrieval at present. In the future,the system needs to be improved to meet users' demand for retrieving images and videos.Practical implications: This FAQ answering system will help farmers utilize agricultural information resources more efficiently.Originality/value: We design the algorithms for calculating similarity of Chinese sentences based on hybrid strategies,which integrate the question surface similarity,the question semantic similarity and the question-answer similarity based on latent semantic analysis(LSA) to find answers to a user's question.
文摘Student mobility or academic mobility involves students moving between institutions during their post-secondary education,and one of the challenging tasks in this process is to assess the transfer credits to be offered to the incoming student.In general,this process involves domain experts comparing the learning outcomes of the courses,to decide on offering transfer credits to the incoming students.This manual implementation is not only labor-intensive but also influenced by undue bias and administrative complexity.The proposed research article focuses on identifying a model that exploits the advancements in the field of Natural Language Processing(NLP)to effectively automate this process.Given the unique structure,domain specificity,and complexity of learning outcomes(LOs),a need for designing a tailor-made model arises.The proposed model uses a clustering-inspired methodology based on knowledge-based semantic similarity measures to assess the taxonomic similarity of LOs and a transformer-based semantic similarity model to assess the semantic similarity of the LOs.The similarity between LOs is further aggregated to form course to course similarity.Due to the lack of quality benchmark datasets,a new benchmark dataset containing seven course-to-course similarity measures is proposed.Understanding the inherent need for flexibility in the decision-making process the aggregation part of the model offers tunable parameters to accommodate different levels of leniency.While providing an efficient model to assess the similarity between courses with existing resources,this research work also steers future research attempts to apply NLP in the field of articulation in an ideal direction by highlighting the persisting research gaps.
基金Supported by the National Natural Science Foundation of China (No. 60873203 ), the Natural Science Foundation of Hebei Province (No F2008000646) and the Guidance Program of the Department of Science and Technology in Hebei Province (No. 72135192).
文摘Because of the anonymity and openness of E-commerce, the on-line transaction and the selection of network resources meet new challenges. For this reason, a trust domain-based multi-agent model for network resource selection is presented. The model divides the network into numbers of trust domains and prevents the inconsistency of information maintained by different agents through the periodical communication among the agents. The model enables consumers to receive the response from the agents much quicker because the trust values of participators are evalUated and updated dynamically and timely after the completion of each transaction. In order to make users choose the best matching services and give users with trusted services, the model takes into account the similarity between services and the service providers' recognition to the services. Finally, the model illustrates the effectiveness and feasibility according to the experiment.
文摘As the tsunami of data has emerged,search engines have become the most powerful tool for obtaining scattered information on the internet.The traditional search engines return the organized results by using ranking algorithm such as term frequency,link analysis(PageRank algorithm and HITS algorithm)etc.However,these algorithms must combine the keyword frequency to determine the relevance between user’s query and the data in the computer system or internet.Moreover,we expect the search engines could understand users’searching by content meanings rather than literal strings.Semantic Web is an intelligent network and it could understand human’s language more semantically and make the communication easier between human and computers.But,the current technology for the semantic search is hard to apply.Because some meta data should be annotated to each web pages,then the search engine will have the ability to understand the users intend.However,annotate every web page is very time-consuming and leads to inefficiency.So,this study designed an ontology-based approach to improve the current traditional keyword-based search and emulate the effects of semantic search.And let the search engine can understand users more semantically when it gets the knowledge.
文摘With the development of big data,all walks of life in society have begun to venture into big data to serve their own enterprises and departments.Big data has been embraced by university digital libraries.The most cumbersome work for the management of university libraries is document retrieval.This article uses Hadoop algorithm to extract semantic keywords and then calculates semantic similarity based on the literature retrieval keyword calculation process.The fast-matching method is used to determine the weight of each keyword,so as to ensure an efficient and accurate document retrieval in digital libraries,thus completing the design of the document retrieval method for university digital libraries based on Hadoop technology.
文摘Nowadays,we can use the multi-task learning approach to train a machine-learning algorithm to learn multiple related tasks instead of training it to solve a single task.In this work,we propose an algorithm for estimating textual similarity scores and then use these scores in multiple tasks such as text ranking,essay grading,and question answering systems.We used several vectorization schemes to represent the Arabic texts in the SemEval2017-task3-subtask-D dataset.The used schemes include lexical-based similarity features,frequency-based features,and pre-trained model-based features.Also,we used contextual-based embedding models such as Arabic Bidirectional Encoder Representations from Transformers(AraBERT).We used the AraBERT model in two different variants.First,as a feature extractor in addition to the text vectorization schemes’features.We fed those features to various regression models to make a prediction value that represents the relevancy score between Arabic text units.Second,AraBERT is adopted as a pre-trained model,and its parameters are fine-tuned to estimate the relevancy scores between Arabic textual sentences.To evaluate the research results,we conducted several experiments to compare the use of the AraBERT model in its two variants.In terms of Mean Absolute Percentage Error(MAPE),the results showminor variance between AraBERT v0.2 as a feature extractor(21.7723)and the fine-tuned AraBERT v2(21.8211).On the other hand,AraBERT v0.2-Large as a feature extractor outperforms the finetuned AraBERT v2 model on the used data set in terms of the coefficient of determination(R2)values(0.014050,−0.032861),respectively.
基金supported by the National Natural Science Foundation of China (50375010).
文摘Based on the text orientation classification, a new measurement approach to semantic orientation of words was proposed. According to the integrated and detailed definition of words in HowNet, seed sets including the words with intense orientations were built up. The orientation similarity between the seed words and the given word was then calculated using the sentiment weight priority to recognize the semantic orientation of common words. Finally, the words' semantic orientation and the context were combined to recognize the given words' orientation. The experiments show that the measurement approach achieves better results for common words' orientation classification and contributes particularly to the text orientation classification of large granularities.