To solve the problem of the inadequacy of semantic processing in the intelligent question answering system, an integrated semantic similarity model which calculates the semantic similarity using the geometric distance...To solve the problem of the inadequacy of semantic processing in the intelligent question answering system, an integrated semantic similarity model which calculates the semantic similarity using the geometric distance and information content is presented in this paper. With the help of interrelationship between concepts, the information content of concepts and the strength of the edges in the ontology network, we can calculate the semantic similarity between two concepts and provide information for the further calculation of the semantic similarity between user’s question and answers in knowledge base. The results of the experiments on the prototype have shown that the semantic problem in natural language processing can also be solved with the help of the knowledge and the abundant semantic information in ontology. More than 90% accuracy with less than 50 ms average searching time in the intelligent question answering prototype system based on ontology has been reached. The result is very satisfied. Key words intelligent question answering system - ontology - semantic similarity - geometric distance - information content CLC number TP39 Foundation item: Supported by the important science and technology item of China of “The 10th Five-year Plan” (2001BA101A05-04)Biography: LIU Ya-jun (1953-), female, Associate professor, research direction: software engineering, information processing, data-base application.展开更多
Most of the questions from users lack the context needed to thoroughly understand the problemat hand,thus making the questions impossible to answer.Semantic Similarity Estimation is based on relating user’s questions...Most of the questions from users lack the context needed to thoroughly understand the problemat hand,thus making the questions impossible to answer.Semantic Similarity Estimation is based on relating user’s questions to the context from previous Conversational Search Systems(CSS)to provide answers without requesting the user’s context.It imposes constraints on the time needed to produce an answer for the user.The proposed model enables the use of contextual data associated with previous Conversational Searches(CS).While receiving a question in a new conversational search,the model determines the question that refers tomore pastCS.Themodel then infers past contextual data related to the given question and predicts an answer based on the context inferred without engaging in multi-turn interactions or requesting additional data from the user for context.This model shows the ability to use the limited information in user queries for best context inferences based on Closed-Domain-based CS and Bidirectional Encoder Representations from Transformers for textual representations.展开更多
Ant-based text clustering is a promising technique that has attracted great research attention. This paper attempts to improve the standard ant-based text-clustering algorithm in two dimensions. On one hand, the ontol...Ant-based text clustering is a promising technique that has attracted great research attention. This paper attempts to improve the standard ant-based text-clustering algorithm in two dimensions. On one hand, the ontology-based semantic similarity measure is used in conjunction with the traditional vector-space-model-based measure to provide more accurate assessment of the similarity between documents. On the other, the ant behavior model is modified to pursue better algorithmic performance. Especially, the ant movement rule is adjusted so as to direct a laden ant toward a dense area of the same type of items as the ant's carrying item, and to direct an unladen ant toward an area that contains an item dissimilar with the surrounding items within its Moore neighborhood. Using WordNet as the base ontology for assessing the semantic similarity between documents, the proposed algorithm is tested with a sample set of documents excerpted from the Reuters-21578 corpus and the experiment results partly indicate that the proposed algorithm perform better than the standard ant-based text-clustering algorithm and the k-means algorithm.展开更多
In Chinese question answering system, because there is more semantic relation in questions than that in query words, the precision can be improved by expanding query while using natural language questions to retrieve ...In Chinese question answering system, because there is more semantic relation in questions than that in query words, the precision can be improved by expanding query while using natural language questions to retrieve documents. This paper proposes a new approach to query expansion based on semantics and statistics Firstly automatic relevance feedback method is used to generate a candidate expansion word set. Then the expanded query words are selected from the set based on the semantic similarity and seman- tic relevancy between the candidate words and the original words. Experiments show the new approach is effective for Web retrieval and out-performs the conventional expansion approaches.展开更多
A reputation mechanism is introduced in P2P- based Semantic Web to solve the problem of lacking trust. It enables Semantic Web to utilize reputation information based on semantic similarity of peers in the network. Th...A reputation mechanism is introduced in P2P- based Semantic Web to solve the problem of lacking trust. It enables Semantic Web to utilize reputation information based on semantic similarity of peers in the network. This approach is evaluated in a simulation of a content sharing system and the experiments show that the system with reputation mechanism outperforms the system without it.展开更多
During the new product development process, reusing the existing CAD models could avoid designing from scratch and decrease human cost. With the advent of big data,how to rapidly and efficiently find out suitable 3D C...During the new product development process, reusing the existing CAD models could avoid designing from scratch and decrease human cost. With the advent of big data,how to rapidly and efficiently find out suitable 3D CAD models for design reuse is taken more attention. Currently the sketch-based retrieval approach makes search more convenient, but its accuracy is not high enough; on the other hand, the semantic-based retrieval approach fully utilizes high level semantic information, and makes search much closer to engineers' intent.However, effectively extracting and representing semantic information from data sets is difficult.Aiming at these problems, we proposed a sketch-based semantic retrieval approach for reusing3 D CAD models. Firstly a fine granularity semantic descriptor is designed for representing 3D CAD models; Secondly, several heuristic rules are adopted to recognize 3D features from 2D sketch, and the correspondences between 3D feature and 2D loops are built; Finally, semantic and shape similarity measurements are combined together to match the input sketch to 3D CAD models. Hence the retrieval accuracy is improved. A sketch-based prototype system is developed.Experimental results validate the feasibility and effectiveness of our proposed approach.展开更多
Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the sema...Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the semantic similarity of short texts. Document-level semantic measurement remains an open issue due to problems such as the omission of background knowledge and topic transition. In this paper, we propose a novel semantic matching method for long documents in the academic domain. To accurately represent the general meaning of an academic article, we construct a semantic profile in which key semantic elements such as the research purpose, methodology, and domain are included and enriched. As such, we can obtain the overall semantic similarity of two papers by computing the distance between their profiles. The distances between the concepts of two different semantic profiles are measured by word vectors. To improve the semantic representation quality of word vectors, we propose a joint word-embedding model for incorporating a domain-specific semantic relation constraint into the traditional context constraint. Our experimental results demonstrate that, in the measurement of document semantic similarity, our approach achieves substantial improvement over state-of-the-art methods, and our joint word-embedding model produces significantly better word representations than traditional word-embedding models.展开更多
In this paper, an improved algorithm, web-based keyword weight algorithm (WKWA), is presented to weight keywords in web documents. WKWA takes into account representation features of web documents and advantages of t...In this paper, an improved algorithm, web-based keyword weight algorithm (WKWA), is presented to weight keywords in web documents. WKWA takes into account representation features of web documents and advantages of the TF*IDF, TFC and ITC algorithms in order to make it more appropriate for web documents. Meanwhile, the presented algorithm is applied to improved vector space model (IVSM). A real system has been implemented for calculating semantic similarities of web documents. Four experiments have been carried out. They are keyword weight calculation, feature item selection, semantic similarity calculation, and WKWA time performance. The results demonstrate accuracy of keyword weight, and semantic similarity is improved.展开更多
Various fishery information systems have been developed in different times and on different platforms. Wet) service application composition is crucial in the sharing and integration of fishery data and information. I...Various fishery information systems have been developed in different times and on different platforms. Wet) service application composition is crucial in the sharing and integration of fishery data and information. In the present paper, a heuristic web service composition method based on fishery ontology is presented, and the proposed web services are described. Ontology reasoning capability was applied to generate a service composition graph. The heuristic function was introduced to reduce the searching space. The experimental results show that the algorithm used considers the services semantic similarity and adiusts web service composition plan by dynamically relying on the empirical data.展开更多
Purpose: The purpose of this study is to develop an automated frequently asked question(FAQ) answering system for farmers. This paper presents an approach for calculating the similarity between Chinese sentences based...Purpose: The purpose of this study is to develop an automated frequently asked question(FAQ) answering system for farmers. This paper presents an approach for calculating the similarity between Chinese sentences based on hybrid strategies.Design/methodology/approach: We analyzed the factors influencing the successful matching between a user's question and a question-answer(QA) pair in the FAQ database. Our approach is based on a combination of multiple factors. Experiments were conducted to test the performance of our method.Findings: Experiments show that this proposed method has higher accuracy. Compared with similarity calculation based on TF-IDF,the sentence surface forms and the semantic relations,the proposed method based on hybrid strategies has a superior performance in precision,recall and F-measure value.Research limitations: The FAQ answering system is only capable of meeting users' demand for text retrieval at present. In the future,the system needs to be improved to meet users' demand for retrieving images and videos.Practical implications: This FAQ answering system will help farmers utilize agricultural information resources more efficiently.Originality/value: We design the algorithms for calculating similarity of Chinese sentences based on hybrid strategies,which integrate the question surface similarity,the question semantic similarity and the question-answer similarity based on latent semantic analysis(LSA) to find answers to a user's question.展开更多
Student mobility or academic mobility involves students moving between institutions during their post-secondary education,and one of the challenging tasks in this process is to assess the transfer credits to be offere...Student mobility or academic mobility involves students moving between institutions during their post-secondary education,and one of the challenging tasks in this process is to assess the transfer credits to be offered to the incoming student.In general,this process involves domain experts comparing the learning outcomes of the courses,to decide on offering transfer credits to the incoming students.This manual implementation is not only labor-intensive but also influenced by undue bias and administrative complexity.The proposed research article focuses on identifying a model that exploits the advancements in the field of Natural Language Processing(NLP)to effectively automate this process.Given the unique structure,domain specificity,and complexity of learning outcomes(LOs),a need for designing a tailor-made model arises.The proposed model uses a clustering-inspired methodology based on knowledge-based semantic similarity measures to assess the taxonomic similarity of LOs and a transformer-based semantic similarity model to assess the semantic similarity of the LOs.The similarity between LOs is further aggregated to form course to course similarity.Due to the lack of quality benchmark datasets,a new benchmark dataset containing seven course-to-course similarity measures is proposed.Understanding the inherent need for flexibility in the decision-making process the aggregation part of the model offers tunable parameters to accommodate different levels of leniency.While providing an efficient model to assess the similarity between courses with existing resources,this research work also steers future research attempts to apply NLP in the field of articulation in an ideal direction by highlighting the persisting research gaps.展开更多
As the tsunami of data has emerged,search engines have become the most powerful tool for obtaining scattered information on the internet.The traditional search engines return the organized results by using ranking alg...As the tsunami of data has emerged,search engines have become the most powerful tool for obtaining scattered information on the internet.The traditional search engines return the organized results by using ranking algorithm such as term frequency,link analysis(PageRank algorithm and HITS algorithm)etc.However,these algorithms must combine the keyword frequency to determine the relevance between user’s query and the data in the computer system or internet.Moreover,we expect the search engines could understand users’searching by content meanings rather than literal strings.Semantic Web is an intelligent network and it could understand human’s language more semantically and make the communication easier between human and computers.But,the current technology for the semantic search is hard to apply.Because some meta data should be annotated to each web pages,then the search engine will have the ability to understand the users intend.However,annotate every web page is very time-consuming and leads to inefficiency.So,this study designed an ontology-based approach to improve the current traditional keyword-based search and emulate the effects of semantic search.And let the search engine can understand users more semantically when it gets the knowledge.展开更多
With the development of big data,all walks of life in society have begun to venture into big data to serve their own enterprises and departments.Big data has been embraced by university digital libraries.The most cumb...With the development of big data,all walks of life in society have begun to venture into big data to serve their own enterprises and departments.Big data has been embraced by university digital libraries.The most cumbersome work for the management of university libraries is document retrieval.This article uses Hadoop algorithm to extract semantic keywords and then calculates semantic similarity based on the literature retrieval keyword calculation process.The fast-matching method is used to determine the weight of each keyword,so as to ensure an efficient and accurate document retrieval in digital libraries,thus completing the design of the document retrieval method for university digital libraries based on Hadoop technology.展开更多
Nowadays,we can use the multi-task learning approach to train a machine-learning algorithm to learn multiple related tasks instead of training it to solve a single task.In this work,we propose an algorithm for estimat...Nowadays,we can use the multi-task learning approach to train a machine-learning algorithm to learn multiple related tasks instead of training it to solve a single task.In this work,we propose an algorithm for estimating textual similarity scores and then use these scores in multiple tasks such as text ranking,essay grading,and question answering systems.We used several vectorization schemes to represent the Arabic texts in the SemEval2017-task3-subtask-D dataset.The used schemes include lexical-based similarity features,frequency-based features,and pre-trained model-based features.Also,we used contextual-based embedding models such as Arabic Bidirectional Encoder Representations from Transformers(AraBERT).We used the AraBERT model in two different variants.First,as a feature extractor in addition to the text vectorization schemes’features.We fed those features to various regression models to make a prediction value that represents the relevancy score between Arabic text units.Second,AraBERT is adopted as a pre-trained model,and its parameters are fine-tuned to estimate the relevancy scores between Arabic textual sentences.To evaluate the research results,we conducted several experiments to compare the use of the AraBERT model in its two variants.In terms of Mean Absolute Percentage Error(MAPE),the results showminor variance between AraBERT v0.2 as a feature extractor(21.7723)and the fine-tuned AraBERT v2(21.8211).On the other hand,AraBERT v0.2-Large as a feature extractor outperforms the finetuned AraBERT v2 model on the used data set in terms of the coefficient of determination(R2)values(0.014050,−0.032861),respectively.展开更多
Based on the text orientation classification, a new measurement approach to semantic orientation of words was proposed. According to the integrated and detailed definition of words in HowNet, seed sets including the w...Based on the text orientation classification, a new measurement approach to semantic orientation of words was proposed. According to the integrated and detailed definition of words in HowNet, seed sets including the words with intense orientations were built up. The orientation similarity between the seed words and the given word was then calculated using the sentiment weight priority to recognize the semantic orientation of common words. Finally, the words' semantic orientation and the context were combined to recognize the given words' orientation. The experiments show that the measurement approach achieves better results for common words' orientation classification and contributes particularly to the text orientation classification of large granularities.展开更多
As a mean to map ontology concepts, a similarity technique is employed.Especially a context dependent concept mapping is tackled, which needs contextual information fromknowledge taxonomy. Context-based semantic simil...As a mean to map ontology concepts, a similarity technique is employed.Especially a context dependent concept mapping is tackled, which needs contextual information fromknowledge taxonomy. Context-based semantic similarity differs from the real world similarity in thatit requires contextual information to calculate similarity. The notion of semantic coupling isintroduced to derive similarity for a taxonomy-based system. The semantic coupling shows the degreeof semantic cohesiveness for a group of concepts toward a given context. In order to calculate thesemantic coupling effectively, the edge counting method is revisited for measuring basic semanticsimilarity by considering the weighting attributes from where they affect an edge''s strength. Theattributes of scaling depth effect, semantic relation type, and virtual connection for the edgecounting are considered. Furthermore, how the proposed edge counting method could be well adaptedfor calculating context-based similarity is showed. Thorough experimental results are provided forboth edge counting and context-based similarity. The results of proposed edge counting wereencouraging compared with other combined approaches, and the context-based similarity also showedunderstandable results. The novel contributions of this paper come from two aspects. First, thesimilarity is increased to the viable level for edge counting. Second, a mechanism is provided toderive a context-based similarity in taxonomy-based system, which has emerged as a hot issue in theliterature such as Semantic Web, MDR, and other ontology-mapping environments.展开更多
Background Although biomedical ontologies have standardized the representation of gene products across species and databases, a method for determining the functional similarities of gene products has not yet been deve...Background Although biomedical ontologies have standardized the representation of gene products across species and databases, a method for determining the functional similarities of gene products has not yet been developed. Methods We proposed a new semantic similarity measure based on Gene Ontology that considers the semantic influences from all of the ancestor terms in a graph. Our measure was compared with Resnik's measure in two applications, which were based on the association of the measure used with the gene co-expression and the protein- protein interactions. Results The results showed a considerable association between the semantic similarity and the expression correlation and between the semantic similarity and the protein-protein interactions, and our measure performed the best overall. Conclusion These results revealed the potential value of our newly proposed semantic similarity measure in studying the functional relevance of gene products.展开更多
Radiology doctors perform text-based image retrieval when they want to retrieve medical images.However,the accuracy and efficiency of such retrieval cannot keep up with the requirements.An innovative algorithm is bein...Radiology doctors perform text-based image retrieval when they want to retrieve medical images.However,the accuracy and efficiency of such retrieval cannot keep up with the requirements.An innovative algorithm is being proposed to retrieve similar medical images.First,we extract the professional terms from the ontology structure and use them to annotate the CT images.Second,the semantic similarity matrix of ontology terms is calculated according to the structure of the ontology.Lastly,the corresponding semantic distance is calculated according to the marked vector,which contains different annotations.We use 120 real liver CT images(divided into six categories)of a top three-hospital to run the algorithm of the program.Result shows that the retrieval index"Precision"is 80.81%,and the classification index"AUC(Area Under Curve)"under the"ROC curve"(Receiver Operating Characteristic)is 0.945.展开更多
Keyword extraction is a branch of natural language processing,which plays an important role in many tasks,such as long text classification,automatic summary,machine translation,dialogue system,etc.All of them need to ...Keyword extraction is a branch of natural language processing,which plays an important role in many tasks,such as long text classification,automatic summary,machine translation,dialogue system,etc.All of them need to use high-quality keywords as a starting point.In this paper,we propose a deep learning network called deep neural semantic network(DNSN)to solve the problem of short text keyword extraction.It can map short text and words to the same semantic space,get the semantic vector of them at the same time,and then compute the similarity between short text and words to extract top-ranked words as keywords.The Bidirectional Encoder Representations from Transformers was first used to obtain the initial semantic feature vectors of short text and words,and then feed the initial semantic feature vectors to the residual network so as to obtain the final semantic vectors of short text and words at the same vector space.Finally,the keywords were extracted by calculating the similarity between short text and words.Compared with existed baseline models including Frequency,Term Frequency Inverse Document Frequency(TF-IDF)and Text-Rank,the model proposed is superior to the baseline models in Precision,Recall,and F-score on the same batch of test dataset.In addition,the precision,recall,and F-score are 6.79%,5.67%,and 11.08%higher than the baseline model in the best case,respectively.展开更多
A three-dimensional boundary-spanning technology search model including search depth, scope and height is established, and a quantitative calculation method is proposed to dynamically describe an organisation's te...A three-dimensional boundary-spanning technology search model including search depth, scope and height is established, and a quantitative calculation method is proposed to dynamically describe an organisation's technology search behaviour and demand characteristics. Organisations are clustered by types as technical, comprehensive, or professional using k-means based on technology search behaviour. Recommendation strategies for various types of organisations are proposed based on this, and the search and supply libraries of each organisation are built by considering their type and search contents. The semantic similarity between patents in different libraries is calculated using a Word2Vec and TextRank model to achieve patent recommendations. An empirical study of the robotics field shows a recommendation accuracy of 0.751, and the accuracy of the technical, comprehensive, and professional types is 0.8282, 0.5389 and 0.7723, respectively. This study considers an organisation's dynamic search behaviour and makes class-based recommendations, with a low computational complexity and strong interpretability.展开更多
文摘To solve the problem of the inadequacy of semantic processing in the intelligent question answering system, an integrated semantic similarity model which calculates the semantic similarity using the geometric distance and information content is presented in this paper. With the help of interrelationship between concepts, the information content of concepts and the strength of the edges in the ontology network, we can calculate the semantic similarity between two concepts and provide information for the further calculation of the semantic similarity between user’s question and answers in knowledge base. The results of the experiments on the prototype have shown that the semantic problem in natural language processing can also be solved with the help of the knowledge and the abundant semantic information in ontology. More than 90% accuracy with less than 50 ms average searching time in the intelligent question answering prototype system based on ontology has been reached. The result is very satisfied. Key words intelligent question answering system - ontology - semantic similarity - geometric distance - information content CLC number TP39 Foundation item: Supported by the important science and technology item of China of “The 10th Five-year Plan” (2001BA101A05-04)Biography: LIU Ya-jun (1953-), female, Associate professor, research direction: software engineering, information processing, data-base application.
文摘Most of the questions from users lack the context needed to thoroughly understand the problemat hand,thus making the questions impossible to answer.Semantic Similarity Estimation is based on relating user’s questions to the context from previous Conversational Search Systems(CSS)to provide answers without requesting the user’s context.It imposes constraints on the time needed to produce an answer for the user.The proposed model enables the use of contextual data associated with previous Conversational Searches(CS).While receiving a question in a new conversational search,the model determines the question that refers tomore pastCS.Themodel then infers past contextual data related to the given question and predicts an answer based on the context inferred without engaging in multi-turn interactions or requesting additional data from the user for context.This model shows the ability to use the limited information in user queries for best context inferences based on Closed-Domain-based CS and Bidirectional Encoder Representations from Transformers for textual representations.
基金This work was supported in part by National Natural Science Foundation of China under Grants No.70301009 and No. 70431001, and by Ministry of Education, Culture, Sports, Science and Technology of Japan under the "Kanazawa Region, Ishikawa High-Tech Sensing Cluster of Knowledge-Based Cluster Creation Project"
文摘Ant-based text clustering is a promising technique that has attracted great research attention. This paper attempts to improve the standard ant-based text-clustering algorithm in two dimensions. On one hand, the ontology-based semantic similarity measure is used in conjunction with the traditional vector-space-model-based measure to provide more accurate assessment of the similarity between documents. On the other, the ant behavior model is modified to pursue better algorithmic performance. Especially, the ant movement rule is adjusted so as to direct a laden ant toward a dense area of the same type of items as the ant's carrying item, and to direct an unladen ant toward an area that contains an item dissimilar with the surrounding items within its Moore neighborhood. Using WordNet as the base ontology for assessing the semantic similarity between documents, the proposed algorithm is tested with a sample set of documents excerpted from the Reuters-21578 corpus and the experiment results partly indicate that the proposed algorithm perform better than the standard ant-based text-clustering algorithm and the k-means algorithm.
基金the Specialized Research Program Fundthe Doctoral Program of Higher Education of China (20050007023)the Natural Science Foundation of Shandong Province(Y2004G04)
文摘In Chinese question answering system, because there is more semantic relation in questions than that in query words, the precision can be improved by expanding query while using natural language questions to retrieve documents. This paper proposes a new approach to query expansion based on semantics and statistics Firstly automatic relevance feedback method is used to generate a candidate expansion word set. Then the expanded query words are selected from the set based on the semantic similarity and seman- tic relevancy between the candidate words and the original words. Experiments show the new approach is effective for Web retrieval and out-performs the conventional expansion approaches.
基金Supported by the National Natural Science Foun-dation of China (60173026) the Ministry of Education Key Project(105071) Foundation of E-Institute of Shanghai HighInstitutions(200301)
文摘A reputation mechanism is introduced in P2P- based Semantic Web to solve the problem of lacking trust. It enables Semantic Web to utilize reputation information based on semantic similarity of peers in the network. This approach is evaluated in a simulation of a content sharing system and the experiments show that the system with reputation mechanism outperforms the system without it.
基金Supported by the National Natural Science Foundation of China(61502129,61572432,61163016)the Zhejiang Natural Science Foundation of China(LQ16F020004,LQ15F020011)the University Scientific Research Projects of Ningxia Province of China(NGY2015161)
文摘During the new product development process, reusing the existing CAD models could avoid designing from scratch and decrease human cost. With the advent of big data,how to rapidly and efficiently find out suitable 3D CAD models for design reuse is taken more attention. Currently the sketch-based retrieval approach makes search more convenient, but its accuracy is not high enough; on the other hand, the semantic-based retrieval approach fully utilizes high level semantic information, and makes search much closer to engineers' intent.However, effectively extracting and representing semantic information from data sets is difficult.Aiming at these problems, we proposed a sketch-based semantic retrieval approach for reusing3 D CAD models. Firstly a fine granularity semantic descriptor is designed for representing 3D CAD models; Secondly, several heuristic rules are adopted to recognize 3D features from 2D sketch, and the correspondences between 3D feature and 2D loops are built; Finally, semantic and shape similarity measurements are combined together to match the input sketch to 3D CAD models. Hence the retrieval accuracy is improved. A sketch-based prototype system is developed.Experimental results validate the feasibility and effectiveness of our proposed approach.
基金supported by the Foundation of the State Key Laboratory of Software Development Environment(No.SKLSDE-2015ZX-04)
文摘Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the semantic similarity of short texts. Document-level semantic measurement remains an open issue due to problems such as the omission of background knowledge and topic transition. In this paper, we propose a novel semantic matching method for long documents in the academic domain. To accurately represent the general meaning of an academic article, we construct a semantic profile in which key semantic elements such as the research purpose, methodology, and domain are included and enriched. As such, we can obtain the overall semantic similarity of two papers by computing the distance between their profiles. The distances between the concepts of two different semantic profiles are measured by word vectors. To improve the semantic representation quality of word vectors, we propose a joint word-embedding model for incorporating a domain-specific semantic relation constraint into the traditional context constraint. Our experimental results demonstrate that, in the measurement of document semantic similarity, our approach achieves substantial improvement over state-of-the-art methods, and our joint word-embedding model produces significantly better word representations than traditional word-embedding models.
基金Project supported by the Science Foundation of Shanghai Municipal Commission of Science and Technology (Grant No.055115001)
文摘In this paper, an improved algorithm, web-based keyword weight algorithm (WKWA), is presented to weight keywords in web documents. WKWA takes into account representation features of web documents and advantages of the TF*IDF, TFC and ITC algorithms in order to make it more appropriate for web documents. Meanwhile, the presented algorithm is applied to improved vector space model (IVSM). A real system has been implemented for calculating semantic similarities of web documents. Four experiments have been carried out. They are keyword weight calculation, feature item selection, semantic similarity calculation, and WKWA time performance. The results demonstrate accuracy of keyword weight, and semantic similarity is improved.
基金funded by the National High Technology Research and Development Program of China (2006AA10Z239)the Key Technologies R&D Program of China during the 11th Five-Year Plan period (2006BAD10A05)
文摘Various fishery information systems have been developed in different times and on different platforms. Wet) service application composition is crucial in the sharing and integration of fishery data and information. In the present paper, a heuristic web service composition method based on fishery ontology is presented, and the proposed web services are described. Ontology reasoning capability was applied to generate a service composition graph. The heuristic function was introduced to reduce the searching space. The experimental results show that the algorithm used considers the services semantic similarity and adiusts web service composition plan by dynamically relying on the empirical data.
基金jointly supported by the National Social Science Foundation of China(Grant Nos.:08ATQ003 and 10&ZD134)
文摘Purpose: The purpose of this study is to develop an automated frequently asked question(FAQ) answering system for farmers. This paper presents an approach for calculating the similarity between Chinese sentences based on hybrid strategies.Design/methodology/approach: We analyzed the factors influencing the successful matching between a user's question and a question-answer(QA) pair in the FAQ database. Our approach is based on a combination of multiple factors. Experiments were conducted to test the performance of our method.Findings: Experiments show that this proposed method has higher accuracy. Compared with similarity calculation based on TF-IDF,the sentence surface forms and the semantic relations,the proposed method based on hybrid strategies has a superior performance in precision,recall and F-measure value.Research limitations: The FAQ answering system is only capable of meeting users' demand for text retrieval at present. In the future,the system needs to be improved to meet users' demand for retrieving images and videos.Practical implications: This FAQ answering system will help farmers utilize agricultural information resources more efficiently.Originality/value: We design the algorithms for calculating similarity of Chinese sentences based on hybrid strategies,which integrate the question surface similarity,the question semantic similarity and the question-answer similarity based on latent semantic analysis(LSA) to find answers to a user's question.
文摘Student mobility or academic mobility involves students moving between institutions during their post-secondary education,and one of the challenging tasks in this process is to assess the transfer credits to be offered to the incoming student.In general,this process involves domain experts comparing the learning outcomes of the courses,to decide on offering transfer credits to the incoming students.This manual implementation is not only labor-intensive but also influenced by undue bias and administrative complexity.The proposed research article focuses on identifying a model that exploits the advancements in the field of Natural Language Processing(NLP)to effectively automate this process.Given the unique structure,domain specificity,and complexity of learning outcomes(LOs),a need for designing a tailor-made model arises.The proposed model uses a clustering-inspired methodology based on knowledge-based semantic similarity measures to assess the taxonomic similarity of LOs and a transformer-based semantic similarity model to assess the semantic similarity of the LOs.The similarity between LOs is further aggregated to form course to course similarity.Due to the lack of quality benchmark datasets,a new benchmark dataset containing seven course-to-course similarity measures is proposed.Understanding the inherent need for flexibility in the decision-making process the aggregation part of the model offers tunable parameters to accommodate different levels of leniency.While providing an efficient model to assess the similarity between courses with existing resources,this research work also steers future research attempts to apply NLP in the field of articulation in an ideal direction by highlighting the persisting research gaps.
文摘As the tsunami of data has emerged,search engines have become the most powerful tool for obtaining scattered information on the internet.The traditional search engines return the organized results by using ranking algorithm such as term frequency,link analysis(PageRank algorithm and HITS algorithm)etc.However,these algorithms must combine the keyword frequency to determine the relevance between user’s query and the data in the computer system or internet.Moreover,we expect the search engines could understand users’searching by content meanings rather than literal strings.Semantic Web is an intelligent network and it could understand human’s language more semantically and make the communication easier between human and computers.But,the current technology for the semantic search is hard to apply.Because some meta data should be annotated to each web pages,then the search engine will have the ability to understand the users intend.However,annotate every web page is very time-consuming and leads to inefficiency.So,this study designed an ontology-based approach to improve the current traditional keyword-based search and emulate the effects of semantic search.And let the search engine can understand users more semantically when it gets the knowledge.
文摘With the development of big data,all walks of life in society have begun to venture into big data to serve their own enterprises and departments.Big data has been embraced by university digital libraries.The most cumbersome work for the management of university libraries is document retrieval.This article uses Hadoop algorithm to extract semantic keywords and then calculates semantic similarity based on the literature retrieval keyword calculation process.The fast-matching method is used to determine the weight of each keyword,so as to ensure an efficient and accurate document retrieval in digital libraries,thus completing the design of the document retrieval method for university digital libraries based on Hadoop technology.
文摘Nowadays,we can use the multi-task learning approach to train a machine-learning algorithm to learn multiple related tasks instead of training it to solve a single task.In this work,we propose an algorithm for estimating textual similarity scores and then use these scores in multiple tasks such as text ranking,essay grading,and question answering systems.We used several vectorization schemes to represent the Arabic texts in the SemEval2017-task3-subtask-D dataset.The used schemes include lexical-based similarity features,frequency-based features,and pre-trained model-based features.Also,we used contextual-based embedding models such as Arabic Bidirectional Encoder Representations from Transformers(AraBERT).We used the AraBERT model in two different variants.First,as a feature extractor in addition to the text vectorization schemes’features.We fed those features to various regression models to make a prediction value that represents the relevancy score between Arabic text units.Second,AraBERT is adopted as a pre-trained model,and its parameters are fine-tuned to estimate the relevancy scores between Arabic textual sentences.To evaluate the research results,we conducted several experiments to compare the use of the AraBERT model in its two variants.In terms of Mean Absolute Percentage Error(MAPE),the results showminor variance between AraBERT v0.2 as a feature extractor(21.7723)and the fine-tuned AraBERT v2(21.8211).On the other hand,AraBERT v0.2-Large as a feature extractor outperforms the finetuned AraBERT v2 model on the used data set in terms of the coefficient of determination(R2)values(0.014050,−0.032861),respectively.
基金supported by the National Natural Science Foundation of China (50375010).
文摘Based on the text orientation classification, a new measurement approach to semantic orientation of words was proposed. According to the integrated and detailed definition of words in HowNet, seed sets including the words with intense orientations were built up. The orientation similarity between the seed words and the given word was then calculated using the sentiment weight priority to recognize the semantic orientation of common words. Finally, the words' semantic orientation and the context were combined to recognize the given words' orientation. The experiments show that the measurement approach achieves better results for common words' orientation classification and contributes particularly to the text orientation classification of large granularities.
文摘As a mean to map ontology concepts, a similarity technique is employed.Especially a context dependent concept mapping is tackled, which needs contextual information fromknowledge taxonomy. Context-based semantic similarity differs from the real world similarity in thatit requires contextual information to calculate similarity. The notion of semantic coupling isintroduced to derive similarity for a taxonomy-based system. The semantic coupling shows the degreeof semantic cohesiveness for a group of concepts toward a given context. In order to calculate thesemantic coupling effectively, the edge counting method is revisited for measuring basic semanticsimilarity by considering the weighting attributes from where they affect an edge''s strength. Theattributes of scaling depth effect, semantic relation type, and virtual connection for the edgecounting are considered. Furthermore, how the proposed edge counting method could be well adaptedfor calculating context-based similarity is showed. Thorough experimental results are provided forboth edge counting and context-based similarity. The results of proposed edge counting wereencouraging compared with other combined approaches, and the context-based similarity also showedunderstandable results. The novel contributions of this paper come from two aspects. First, thesimilarity is increased to the viable level for edge counting. Second, a mechanism is provided toderive a context-based similarity in taxonomy-based system, which has emerged as a hot issue in theliterature such as Semantic Web, MDR, and other ontology-mapping environments.
文摘Background Although biomedical ontologies have standardized the representation of gene products across species and databases, a method for determining the functional similarities of gene products has not yet been developed. Methods We proposed a new semantic similarity measure based on Gene Ontology that considers the semantic influences from all of the ancestor terms in a graph. Our measure was compared with Resnik's measure in two applications, which were based on the association of the measure used with the gene co-expression and the protein- protein interactions. Results The results showed a considerable association between the semantic similarity and the expression correlation and between the semantic similarity and the protein-protein interactions, and our measure performed the best overall. Conclusion These results revealed the potential value of our newly proposed semantic similarity measure in studying the functional relevance of gene products.
文摘Radiology doctors perform text-based image retrieval when they want to retrieve medical images.However,the accuracy and efficiency of such retrieval cannot keep up with the requirements.An innovative algorithm is being proposed to retrieve similar medical images.First,we extract the professional terms from the ontology structure and use them to annotate the CT images.Second,the semantic similarity matrix of ontology terms is calculated according to the structure of the ontology.Lastly,the corresponding semantic distance is calculated according to the marked vector,which contains different annotations.We use 120 real liver CT images(divided into six categories)of a top three-hospital to run the algorithm of the program.Result shows that the retrieval index"Precision"is 80.81%,and the classification index"AUC(Area Under Curve)"under the"ROC curve"(Receiver Operating Characteristic)is 0.945.
基金the Major Program of National Natural Science Foundation of China(Grant Nos.91938301)the National Defense Equipment Advance Research Shared Technology Program of China(41402050301-170441402065)he Sichuan Science and Technology Major Project on New Generation Artificial Intelligence(2018GZDZX0034).
文摘Keyword extraction is a branch of natural language processing,which plays an important role in many tasks,such as long text classification,automatic summary,machine translation,dialogue system,etc.All of them need to use high-quality keywords as a starting point.In this paper,we propose a deep learning network called deep neural semantic network(DNSN)to solve the problem of short text keyword extraction.It can map short text and words to the same semantic space,get the semantic vector of them at the same time,and then compute the similarity between short text and words to extract top-ranked words as keywords.The Bidirectional Encoder Representations from Transformers was first used to obtain the initial semantic feature vectors of short text and words,and then feed the initial semantic feature vectors to the residual network so as to obtain the final semantic vectors of short text and words at the same vector space.Finally,the keywords were extracted by calculating the similarity between short text and words.Compared with existed baseline models including Frequency,Term Frequency Inverse Document Frequency(TF-IDF)and Text-Rank,the model proposed is superior to the baseline models in Precision,Recall,and F-score on the same batch of test dataset.In addition,the precision,recall,and F-score are 6.79%,5.67%,and 11.08%higher than the baseline model in the best case,respectively.
文摘A three-dimensional boundary-spanning technology search model including search depth, scope and height is established, and a quantitative calculation method is proposed to dynamically describe an organisation's technology search behaviour and demand characteristics. Organisations are clustered by types as technical, comprehensive, or professional using k-means based on technology search behaviour. Recommendation strategies for various types of organisations are proposed based on this, and the search and supply libraries of each organisation are built by considering their type and search contents. The semantic similarity between patents in different libraries is calculated using a Word2Vec and TextRank model to achieve patent recommendations. An empirical study of the robotics field shows a recommendation accuracy of 0.751, and the accuracy of the technical, comprehensive, and professional types is 0.8282, 0.5389 and 0.7723, respectively. This study considers an organisation's dynamic search behaviour and makes class-based recommendations, with a low computational complexity and strong interpretability.