Similarity has been playing an important role in computer science,artificial intelligence(AI)and data science.However,similarity intelligence has been ignored in these disciplines.Similarity intelligence is a process ...Similarity has been playing an important role in computer science,artificial intelligence(AI)and data science.However,similarity intelligence has been ignored in these disciplines.Similarity intelligence is a process of discovering intelligence through similarity.This article will explore similarity intelligence,similarity-based reasoning,similarity computing and analytics.More specifically,this article looks at the similarity as an intelligence and its impact on a few areas in the real world.It explores similarity intelligence accompanying experience-based intelligence,knowledge-based intelligence,and data-based intelligence to play an important role in computer science,AI,and data science.This article explores similarity-based reasoning(SBR)and proposes three similarity-based inference rules.It then examines similarity computing and analytics,and a multiagent SBR system.The main contributions of this article are:1)Similarity intelligence is discovered from experience-based intelligence consisting of data-based intelligence and knowledge-based intelligence.2)Similarity-based reasoning,computing and analytics can be used to create similarity intelligence.The proposed approach will facilitate research and development of similarity intelligence,similarity computing and analytics,machine learning and case-based reasoning.展开更多
The paper proposes a new text similarity computing method based on concept similarity in Chinese text processing. The new method converts text to words vector space model at first, and then splits words into a set of ...The paper proposes a new text similarity computing method based on concept similarity in Chinese text processing. The new method converts text to words vector space model at first, and then splits words into a set of concepts. Through computing the inner products between concepts, it obtains the similarity between words. The new method computes the similarity of text based on the similarity of words at last. The contributions of the paper include: 1) propose a new computing formula between words; 2) propose a new text similarity computing method based on words similarity; 3) successfully use the method in the application of similarity computing of WEB news; and 4) prove the validity of the method through extensive experiments.展开更多
Ontology heterogeneity is the primary obstacle for interoperation of ontologies. Ontology mapping is the best way to solve this problem. The key of ontology mapping is the similarity computation. At present, the metho...Ontology heterogeneity is the primary obstacle for interoperation of ontologies. Ontology mapping is the best way to solve this problem. The key of ontology mapping is the similarity computation. At present, the method of similarity computation is imperfect. And the computation quantity is high. To solve these problems, an ontology-mapping framework with a kind of hybrid architecture is put forward, with an improvement in the method of similarity computation. Different areas have different local ontologies. Two ontologies are taken as examples, to explain the specific mapping framework and improved method of similarity computation. These two ontologies are about classes and teachers in a university. The experimental results show that using this framework and improved method can increase the accuracy of computation to a certain extent. Otherwise, the quantity of computation can be decreased.展开更多
FAQ (frequently asked question) is widely used on the Internet, but most FAQ's asking and answering are not automatic. This paper introduces the design and imple mentation of a FAQ automatic return system based on ...FAQ (frequently asked question) is widely used on the Internet, but most FAQ's asking and answering are not automatic. This paper introduces the design and imple mentation of a FAQ automatic return system based on semantic similarity computation, including computation model choo sing, FAQ characters analyzing, FAQ data formal expressing, feature vector indexing, and weight computing and so on. According to FAQ features of sentence length short, two mapping, strong domain characteristics etc. Vector Space Model with special semantic process was selected in system, and corresponding algorithm of similarity computation was proposed too. Experiment shows that the system has a good performance for high frequent and common questions.展开更多
In recent years,with the development of the social Internet of Things(IoT),all kinds of data accumulated on the network.These data,which contain a lot of social information and opinions.However,these data are rarely f...In recent years,with the development of the social Internet of Things(IoT),all kinds of data accumulated on the network.These data,which contain a lot of social information and opinions.However,these data are rarely fully analyzed,which is a major obstacle to the intelligent development of the social IoT.In this paper,we propose a sentence similarity analysis model to analyze the similarity in people’s opinions on hot topics in social media and news pages.Most of these data are unstructured or semi-structured sentences,so the accuracy of sentence similarity analysis largely determines the model’s performance.For the purpose of improving accuracy,we propose a novel method of sentence similarity computation to extract the syntactic and semantic information of the semi-structured and unstructured sentences.We mainly consider the subjects,predicates and objects of sentence pairs and use Stanford Parser to classify the dependency relation triples to calculate the syntactic and semantic similarity between two sentences.Finally,we verify the performance of the model with the Microsoft Research Paraphrase Corpus(MRPC),which consists of 4076 pairs of training sentences and 1725 pairs of test sentences,and most of the data came from the news of social data.Extensive simulations demonstrate that our method outperforms other state-of-the-art methods regarding the correlation coefficient and the mean deviation.展开更多
A fundamental open question in the analysis of social networks was to understand the evolution between similarity and group social ties.In general,two groups are similar for two distinct reasons:first,they grow to cha...A fundamental open question in the analysis of social networks was to understand the evolution between similarity and group social ties.In general,two groups are similar for two distinct reasons:first,they grow to change their behaviors to the same group due to social influence;second,they tend to merge a group due to similar behaviors,where a process often is termed selection by sociologists.It was important to understand why two groups could merge and what led to high similarities for members in a group,influence or selection.In this paper,the techniques for identifying and modeling interactions between social influence and selection for different groups were developed.Different similarities were computed in three phases where groups came into being,before or after according to the number of common edits in Wikipedia.Experimental results showed selection played a more important role in two group merging.展开更多
Faced with hundreds of thousands of news articles in the news websites,it is difficult for users to find the news articles they are interested in.Therefore,various news recommender systems were built.In the news recom...Faced with hundreds of thousands of news articles in the news websites,it is difficult for users to find the news articles they are interested in.Therefore,various news recommender systems were built.In the news recommendation,these news articles read by a user is typically in the form of a time sequence.However,traditional news recommendation algorithms rarely consider the time sequence characteristic of user browsing behaviors.Therefore,the performance of traditional news recommendation algorithms is not good enough in predicting the next news article which a user will read.To solve this problem,this paper proposes a time-ordered collaborative filtering recommendation algorithm(TOCF),which takes the time sequence characteristic of user behaviors into account.Besides,a new method to compute the similarity among different users,named time-dependent similarity,is proposed.To demonstrate the efficiency of our solution,extensive experiments are conducted along with detailed performance analysis.展开更多
The meaning of a word includes a conceptual meaning and a distributive meaning.Word embedding based on distribution suffers from insufficient conceptual semantic representation caused by data sparsity,especially for l...The meaning of a word includes a conceptual meaning and a distributive meaning.Word embedding based on distribution suffers from insufficient conceptual semantic representation caused by data sparsity,especially for low-frequency words.In knowledge bases,manually annotated semantic knowledge is stable and the essential attributes of words are accurately denoted.In this paper,we propose a Conceptual Semantics Enhanced Word Representation(CEWR)model,computing the synset embedding and hypernym embedding of Chinese words based on the Tongyici Cilin thesaurus,and aggregating it with distributed word representation to have both distributed information and the conceptual meaning encoded in the representation of words.We evaluate the CEWR model on two tasks:word similarity computation and short text classification.The Spearman correlation between model results and human judgement are improved to 64.71%,81.84%,and 85.16%on Wordsim297,MC30,and RG65,respectively.Moreover,CEWR improves the F1 score by 3%in the short text classification task.The experimental results show that CEWR can represent words in a more informative approach than distributed word embedding.This proves that conceptual semantics,especially hypernymous information,is a good complement to distributed word representation.展开更多
Purpose: The purpose of this study is to develop an automated frequently asked question(FAQ) answering system for farmers. This paper presents an approach for calculating the similarity between Chinese sentences based...Purpose: The purpose of this study is to develop an automated frequently asked question(FAQ) answering system for farmers. This paper presents an approach for calculating the similarity between Chinese sentences based on hybrid strategies.Design/methodology/approach: We analyzed the factors influencing the successful matching between a user's question and a question-answer(QA) pair in the FAQ database. Our approach is based on a combination of multiple factors. Experiments were conducted to test the performance of our method.Findings: Experiments show that this proposed method has higher accuracy. Compared with similarity calculation based on TF-IDF,the sentence surface forms and the semantic relations,the proposed method based on hybrid strategies has a superior performance in precision,recall and F-measure value.Research limitations: The FAQ answering system is only capable of meeting users' demand for text retrieval at present. In the future,the system needs to be improved to meet users' demand for retrieving images and videos.Practical implications: This FAQ answering system will help farmers utilize agricultural information resources more efficiently.Originality/value: We design the algorithms for calculating similarity of Chinese sentences based on hybrid strategies,which integrate the question surface similarity,the question semantic similarity and the question-answer similarity based on latent semantic analysis(LSA) to find answers to a user's question.展开更多
文摘Similarity has been playing an important role in computer science,artificial intelligence(AI)and data science.However,similarity intelligence has been ignored in these disciplines.Similarity intelligence is a process of discovering intelligence through similarity.This article will explore similarity intelligence,similarity-based reasoning,similarity computing and analytics.More specifically,this article looks at the similarity as an intelligence and its impact on a few areas in the real world.It explores similarity intelligence accompanying experience-based intelligence,knowledge-based intelligence,and data-based intelligence to play an important role in computer science,AI,and data science.This article explores similarity-based reasoning(SBR)and proposes three similarity-based inference rules.It then examines similarity computing and analytics,and a multiagent SBR system.The main contributions of this article are:1)Similarity intelligence is discovered from experience-based intelligence consisting of data-based intelligence and knowledge-based intelligence.2)Similarity-based reasoning,computing and analytics can be used to create similarity intelligence.The proposed approach will facilitate research and development of similarity intelligence,similarity computing and analytics,machine learning and case-based reasoning.
基金Supported by the China Postdoctoral Science Foundation (Grant No. 20060400002)the Sichuan Youth Science and Technology Foundation of China (Grant No. 08JJ0109)+2 种基金the National Natural Science Foundation of China (Grant Nos.60473051, 60503037)the National High-tech Re- search and Development of China (Grant No. 2006AA01Z230)the Natural Science Foundation of Beijing Natural Science Foundation (Grant No. 4062018)
文摘The paper proposes a new text similarity computing method based on concept similarity in Chinese text processing. The new method converts text to words vector space model at first, and then splits words into a set of concepts. Through computing the inner products between concepts, it obtains the similarity between words. The new method computes the similarity of text based on the similarity of words at last. The contributions of the paper include: 1) propose a new computing formula between words; 2) propose a new text similarity computing method based on words similarity; 3) successfully use the method in the application of similarity computing of WEB news; and 4) prove the validity of the method through extensive experiments.
基金the National Natural Science Foundation of China (70371052).
文摘Ontology heterogeneity is the primary obstacle for interoperation of ontologies. Ontology mapping is the best way to solve this problem. The key of ontology mapping is the similarity computation. At present, the method of similarity computation is imperfect. And the computation quantity is high. To solve these problems, an ontology-mapping framework with a kind of hybrid architecture is put forward, with an improvement in the method of similarity computation. Different areas have different local ontologies. Two ontologies are taken as examples, to explain the specific mapping framework and improved method of similarity computation. These two ontologies are about classes and teachers in a university. The experimental results show that using this framework and improved method can increase the accuracy of computation to a certain extent. Otherwise, the quantity of computation can be decreased.
基金Supported by the National Natural Science Foun-dation of China (60272088)
文摘FAQ (frequently asked question) is widely used on the Internet, but most FAQ's asking and answering are not automatic. This paper introduces the design and imple mentation of a FAQ automatic return system based on semantic similarity computation, including computation model choo sing, FAQ characters analyzing, FAQ data formal expressing, feature vector indexing, and weight computing and so on. According to FAQ features of sentence length short, two mapping, strong domain characteristics etc. Vector Space Model with special semantic process was selected in system, and corresponding algorithm of similarity computation was proposed too. Experiment shows that the system has a good performance for high frequent and common questions.
基金supported by the Major Scientific and Technological Projects of CNPC under Grant ZD2019-183-006partially supported by the Shandong Provincial Natural Science Foundation,China under Grant ZR2020MF006partially supported by“the Fundamental Research Funds for the Central Universities”of China University of Petroleum(East China)under Grant 20CX05017A,18CX02139A.
文摘In recent years,with the development of the social Internet of Things(IoT),all kinds of data accumulated on the network.These data,which contain a lot of social information and opinions.However,these data are rarely fully analyzed,which is a major obstacle to the intelligent development of the social IoT.In this paper,we propose a sentence similarity analysis model to analyze the similarity in people’s opinions on hot topics in social media and news pages.Most of these data are unstructured or semi-structured sentences,so the accuracy of sentence similarity analysis largely determines the model’s performance.For the purpose of improving accuracy,we propose a novel method of sentence similarity computation to extract the syntactic and semantic information of the semi-structured and unstructured sentences.We mainly consider the subjects,predicates and objects of sentence pairs and use Stanford Parser to classify the dependency relation triples to calculate the syntactic and semantic similarity between two sentences.Finally,we verify the performance of the model with the Microsoft Research Paraphrase Corpus(MRPC),which consists of 4076 pairs of training sentences and 1725 pairs of test sentences,and most of the data came from the news of social data.Extensive simulations demonstrate that our method outperforms other state-of-the-art methods regarding the correlation coefficient and the mean deviation.
文摘A fundamental open question in the analysis of social networks was to understand the evolution between similarity and group social ties.In general,two groups are similar for two distinct reasons:first,they grow to change their behaviors to the same group due to social influence;second,they tend to merge a group due to similar behaviors,where a process often is termed selection by sociologists.It was important to understand why two groups could merge and what led to high similarities for members in a group,influence or selection.In this paper,the techniques for identifying and modeling interactions between social influence and selection for different groups were developed.Different similarities were computed in three phases where groups came into being,before or after according to the number of common edits in Wikipedia.Experimental results showed selection played a more important role in two group merging.
基金supported by the Natural Science Foundation of China(No.61170174, 61370205)Tianjin Training plan of University Innovation Team(No.TD12-5016)
文摘Faced with hundreds of thousands of news articles in the news websites,it is difficult for users to find the news articles they are interested in.Therefore,various news recommender systems were built.In the news recommendation,these news articles read by a user is typically in the form of a time sequence.However,traditional news recommendation algorithms rarely consider the time sequence characteristic of user browsing behaviors.Therefore,the performance of traditional news recommendation algorithms is not good enough in predicting the next news article which a user will read.To solve this problem,this paper proposes a time-ordered collaborative filtering recommendation algorithm(TOCF),which takes the time sequence characteristic of user behaviors into account.Besides,a new method to compute the similarity among different users,named time-dependent similarity,is proposed.To demonstrate the efficiency of our solution,extensive experiments are conducted along with detailed performance analysis.
基金This research is supported by the National Science Foundation of China(grant 61772278,author:Qu,W.grant number:61472191,author:Zhou,J.http://www.nsfc.gov.cn/)+2 种基金the National Social Science Foundation of China(grant number:18BYY127,author:Li B.http://www.cssn.cn)the Philosophy and Social Science Foundation of Jiangsu Higher Institution(grant number:2019SJA0220,author:Wei,T.https://jyt.jiangsu.gov.cn)Jiangsu Higher Institutions’Excellent Innovative Team for Philosophy and Social Science(grant number:2017STD006,author:Gu,W.https://jyt.jiangsu.gov.cn)。
文摘The meaning of a word includes a conceptual meaning and a distributive meaning.Word embedding based on distribution suffers from insufficient conceptual semantic representation caused by data sparsity,especially for low-frequency words.In knowledge bases,manually annotated semantic knowledge is stable and the essential attributes of words are accurately denoted.In this paper,we propose a Conceptual Semantics Enhanced Word Representation(CEWR)model,computing the synset embedding and hypernym embedding of Chinese words based on the Tongyici Cilin thesaurus,and aggregating it with distributed word representation to have both distributed information and the conceptual meaning encoded in the representation of words.We evaluate the CEWR model on two tasks:word similarity computation and short text classification.The Spearman correlation between model results and human judgement are improved to 64.71%,81.84%,and 85.16%on Wordsim297,MC30,and RG65,respectively.Moreover,CEWR improves the F1 score by 3%in the short text classification task.The experimental results show that CEWR can represent words in a more informative approach than distributed word embedding.This proves that conceptual semantics,especially hypernymous information,is a good complement to distributed word representation.
基金jointly supported by the National Social Science Foundation of China(Grant Nos.:08ATQ003 and 10&ZD134)
文摘Purpose: The purpose of this study is to develop an automated frequently asked question(FAQ) answering system for farmers. This paper presents an approach for calculating the similarity between Chinese sentences based on hybrid strategies.Design/methodology/approach: We analyzed the factors influencing the successful matching between a user's question and a question-answer(QA) pair in the FAQ database. Our approach is based on a combination of multiple factors. Experiments were conducted to test the performance of our method.Findings: Experiments show that this proposed method has higher accuracy. Compared with similarity calculation based on TF-IDF,the sentence surface forms and the semantic relations,the proposed method based on hybrid strategies has a superior performance in precision,recall and F-measure value.Research limitations: The FAQ answering system is only capable of meeting users' demand for text retrieval at present. In the future,the system needs to be improved to meet users' demand for retrieving images and videos.Practical implications: This FAQ answering system will help farmers utilize agricultural information resources more efficiently.Originality/value: We design the algorithms for calculating similarity of Chinese sentences based on hybrid strategies,which integrate the question surface similarity,the question semantic similarity and the question-answer similarity based on latent semantic analysis(LSA) to find answers to a user's question.