Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and re...Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and realized. The model is an extension of the traditional co-occurrence model by adding a decaying factor that decreases the mutual information when the distance between the terms increases. Experimental results on TREC-9 collections show this query expansion method results in significant improvements over the IR without query expansion.展开更多
A method that combines category-based and keyword-based concepts for a better information retrieval system is introduced. To improve document clustering, a document similarity measure based on cosine vector and keywor...A method that combines category-based and keyword-based concepts for a better information retrieval system is introduced. To improve document clustering, a document similarity measure based on cosine vector and keywords frequency in documents is proposed, but also with an input ontology. The ontology is domain specific and includes a list of keywords organized by degree of importance to the categories of the ontology, and by means of semantic knowledge, the ontology can improve the effects of document similarity measure and feedback of information retrieval systems. Two approaches to evaluating the performance of this similarity measure and the comparison with standard cosine vector similarity measure are also described.展开更多
To promote the efficiency of knowledge base retrieval based on description logic, the concept of assertional graph (AG), which is directed labeled graph, is defined and a new AG-based retrieval method is put forward...To promote the efficiency of knowledge base retrieval based on description logic, the concept of assertional graph (AG), which is directed labeled graph, is defined and a new AG-based retrieval method is put forward. This method converts the knowledge base and query clause into knowledge AG and query AG by making use of the given rules and then makes use of graph traversal to carry out knowledge base retrieval. The experiment indicates that the efficiency of this method exceeds, respectively, the popular RACER and KAON2 system by 0.4% and 3.3%. This method can obviously promote the efficiency of knowledge base retrieval.展开更多
A concept-based approach is expected to resolve the word sense ambiguities in information retrieval and apply the semantic importance of the concepts, instead of the term frequency, to representing the contents of a d...A concept-based approach is expected to resolve the word sense ambiguities in information retrieval and apply the semantic importance of the concepts, instead of the term frequency, to representing the contents of a document. Consequently, a formalized document framework is proposed. The document framework is used to express the meaning of a document with the concepts which are expressed by high semantic importance. The framework consists of two parts: the "domain" information and the "situation & background" information of a document. A document-extracting algorithm and a two-stage smoothing method are also proposed. The quantification of the similarity between the query and the document framework depends on the smoothing method. The experiments on the TREC6 collection demonstrate the feasibility and effectiveness of the proposed approach in information retrieval tasks. The average recall level precision of the model using the proposed approach is about 10% higher than that of traditional ones.展开更多
Through analyzing syntactic,semantic,pragmatic information,the retrieval system ACIS based on comprehensive information was established,which could achieve personalized information exaction to guide user s information...Through analyzing syntactic,semantic,pragmatic information,the retrieval system ACIS based on comprehensive information was established,which could achieve personalized information exaction to guide user s information retrieval.展开更多
[Objective] The aim was to set up a plant digital information retrieval system.[Method] Plant digital information retrieval system was designed by combining with Microsoft Visual Basic 6.0 Enterprise Edition database ...[Objective] The aim was to set up a plant digital information retrieval system.[Method] Plant digital information retrieval system was designed by combining with Microsoft Visual Basic 6.0 Enterprise Edition database management system and Structure Query Language.[Result] The system realized electronic management and retrieval of local plant information.The key words of retrieval included family,genus,formal name,Chinese name,Latin,morphological characteristics,habitat,collection people,collection places,and protect class and so on.[Conclusion] It provided reference for these problems of species identification and digital management of herbarium.展开更多
An approximate approach of querying between heterogeneous ontology-basedinformation systems based on an association matrix is proposed. First, the association matrix isdefined to describe relations between concepts in...An approximate approach of querying between heterogeneous ontology-basedinformation systems based on an association matrix is proposed. First, the association matrix isdefined to describe relations between concepts in two ontologies. Then, a methodof rewriting queriesbased on the association matrix is presented to solve the ontology heterogeneity problem. Itrewrites the queries in one ontology to approximate queries in another ontology based on thesubsumption relations between concepts. The method also uses vectors to represent queries, and thencomputes the vectors with the association matrix; the disjoint relations between concepts can beconsidered by the results. It can get better approximations than the methods currently in use, whichdonot consider disjoint relations. The method can be processed by machines automatically. It issimple to implement and expected to run quite fast.展开更多
How to deal with the imprecise information retrieval has become more and more important in the present information society. An efficient and effective method of information retrieval based on multi tuple rough set is...How to deal with the imprecise information retrieval has become more and more important in the present information society. An efficient and effective method of information retrieval based on multi tuple rough set is discussed in this paper. The new approach is considered as a generalization of the original rough set model for flexible information retrieval. The imprecise query results can be obtained by multi tuple approximations.展开更多
To further enhance the efficiencies of search engines,achieving capabilities of searching,indexing and locating the information in the deep web,latent semantic analysis is a simple and effective way.Through the latent...To further enhance the efficiencies of search engines,achieving capabilities of searching,indexing and locating the information in the deep web,latent semantic analysis is a simple and effective way.Through the latent semantic analysis of the attributes in the query interfaces and the unique entrances of the deep web sites,the hidden semantic structure information can be retrieved and dimension reduction can be achieved to a certain extent.Using this semantic structure information,the contents in the site can be inferred and the similarity measures among sites in deep web can be revised.Experimental results show that latent semantic analysis revises and improves the semantic understanding of the query form in the deep web,which overcomes the shortcomings of the keyword-based methods.This approach can be used to effectively search the most similar site for any given site and to obtain a site list which conforms to the restrictions one specifies.展开更多
AIM:To compare fluoroscopic, endoscopic and guide wire assistance with ultraslim gastroscopy for placement of nasojejunal feeding tubes. METHODS:The information regarding nasojejunal tube placement procedures was retr...AIM:To compare fluoroscopic, endoscopic and guide wire assistance with ultraslim gastroscopy for placement of nasojejunal feeding tubes. METHODS:The information regarding nasojejunal tube placement procedures was retrieved using the gastrointestinal tract database at Tongji Hospital affiliated to Tongji Medical College. Records from 81 patients who underwent nasojejunal tubes placement by different techniques between 2004 and 2011 were reviewed for procedure success and tube-related outcomes. RESULTS:Nasojejunal feeding tubes were successfully placed in 78 (96.3%) of 81 patients. The success rate by fluoroscopy was 92% (23 of 25), by endoscopic technique 96.3% (26 of 27), and by guide wire assistance (whether via transnasal or transoral insertion)100% (23/23, 6/6). The average time for successful placement was 14.9 ± 2.9 min for fluoroscopic placement, 14.8 ± 4.9 min for endoscopic placement, 11.1 ± 2.2 min for guide wire assistance with transnasal gastroscopic placement, and 14.7 ± 1.2 min for transoral gastroscopic placement. Statistically, the duration for the third method was significantly different (P < 0.05) compared with the other three methods. Transnasal placement over a guidewire was significantly faster (P < 0.05) than any of the other approaches. CONCLUSION:Guide wire assistance with transnasal insertion of nasojejunal feeding tubes represents a safe, quick and effective method for providing enteral nutrition.展开更多
The main thrust of this paper is application of a novel data mining approach on the log of user' s feedback to improve web multimedia information retrieval performance. A user space model was constructed based...The main thrust of this paper is application of a novel data mining approach on the log of user' s feedback to improve web multimedia information retrieval performance. A user space model was constructed based on data mining, and then integrated into the original information space model to improve the accuracy of the new information space model. It can remove clutter and irrelevant text information and help to eliminate mismatch between the page author' s expression and the user' s understanding and expectation. User spacemodel was also utilized to discover the relationship between high-level and low-level features for assigning weight. The authors proposed improved Bayesian algorithm for data mining. Experiment proved that the au-thors' proposed algorithm was efficient.展开更多
Similarity matching and this paper, a saliency-based information presentation are two matching algorithm is proposed key factors in information retrieval. In for user-oriented search based on the psychological studies...Similarity matching and this paper, a saliency-based information presentation are two matching algorithm is proposed key factors in information retrieval. In for user-oriented search based on the psychological studies on human perception, and major emphasis on the saliently similar aspect of objects to be compared is placed and thus the search result is more agreeable for users. After relevant results are obtained, the cluster-based browsing algorithm is adopted for search result presentation based on social network analysis. By organizing the results in clustered lists, the user can have a general understanding of the whole collection by viewing only a small part of results and locate those of major interest rapidly. Experimental results demonstrate the advantages of the proposed algorithm over the traditional work.展开更多
This paper explores the application of term dependency in information retrieval (IR) and proposes a novel dependency retrieval model. This retrieval model suggests an extension to the existing language modeling (LM) a...This paper explores the application of term dependency in information retrieval (IR) and proposes a novel dependency retrieval model. This retrieval model suggests an extension to the existing language modeling (LM) approach to IR by introducing dependency models for both query and document. Relevance between document and query is then evaluated by reference to the Kullback-Leibler divergence between their dependency models. This paper introduces a novel hybrid dependency structure, which allows integration of various forms of dependency within a single framework. A pseudo relevance feedback based method is also introduced for constructing query dependency model. The basic idea is to use query-relevant top-ranking sentences extracted from the top documents at retrieval time as the augmented representation of query, from which the relationships between query terms are identified. A Markov Random Field (MRF) based approach is presented to ensure the relevance of the extracted sentences, which utilizes the association features between query terms within a sentence to evaluate the relevance of each sentence. This dependency retrieval model was compared with other traditional retrieval models. Experiments indicated that it produces significant improvements in retrieval effectiveness.展开更多
To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new t...To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new terms co-occurrence representation was put forward by analyzing the process of producingquery.The expansion terms were selected according to their correlation to the whole query.At the sametime,the position information between terms were considered.The experimental result on test retrievalconference(TREC)data collection shows that the method proposed in the paper has made an improve-ment of 5%~19% all the time than the language modeling method without expansion.Compared to thepopular approach of query expansion,pseudo feedback,the precision of the proposed method is competi-tive.展开更多
The incompatible probability represents an important non-classical phenomenon, and it describes conflicting observed marginal probabilities, which cannot be satisfied with a joint probability. First, the incompatibili...The incompatible probability represents an important non-classical phenomenon, and it describes conflicting observed marginal probabilities, which cannot be satisfied with a joint probability. First, the incompatibility of random variables was defined and discussed via the non-positive semi-definiteness of their covariance matrixes. Then, a method was proposed to verify the existence of incompatible probability for variables. A hypothesis testing was also applied to reexamine the likelihood of the observed marginal probabilities being integrated into a joint probability space, thus showing the statistical significance of incompatible probability cases. A case study with user click-through data provided the initial evidence of the incompatible probability in information retrieval (IR), particularly in user interaction. The experiments indicate that both incompatible and compatible cases can be found in IR data, and informational queries are more likely to be compatible than navigational queries. The results inspire new theoretical perspectives of modeling the complex interactions and phenomena in IR.展开更多
文摘Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and realized. The model is an extension of the traditional co-occurrence model by adding a decaying factor that decreases the mutual information when the distance between the terms increases. Experimental results on TREC-9 collections show this query expansion method results in significant improvements over the IR without query expansion.
基金The Young Teachers Scientific Research Foundation (YTSRF) of Nanjing University of Science and Technology in the Year of2005-2006.
文摘A method that combines category-based and keyword-based concepts for a better information retrieval system is introduced. To improve document clustering, a document similarity measure based on cosine vector and keywords frequency in documents is proposed, but also with an input ontology. The ontology is domain specific and includes a list of keywords organized by degree of importance to the categories of the ontology, and by means of semantic knowledge, the ontology can improve the effects of document similarity measure and feedback of information retrieval systems. Two approaches to evaluating the performance of this similarity measure and the comparison with standard cosine vector similarity measure are also described.
基金The National Natural Science Foundation of China(No.69975010,60374054),the National Research Foundation for the Doctoral Program of Higher Education of China (No.20050007023).
文摘To promote the efficiency of knowledge base retrieval based on description logic, the concept of assertional graph (AG), which is directed labeled graph, is defined and a new AG-based retrieval method is put forward. This method converts the knowledge base and query clause into knowledge AG and query AG by making use of the given rules and then makes use of graph traversal to carry out knowledge base retrieval. The experiment indicates that the efficiency of this method exceeds, respectively, the popular RACER and KAON2 system by 0.4% and 3.3%. This method can obviously promote the efficiency of knowledge base retrieval.
基金The National Basic Research Program of China(973Program)(No.2004CB318104),the Knowledge Innovation Pro-gram of Chinese Academy of Sciences (No.13CX04).
文摘A concept-based approach is expected to resolve the word sense ambiguities in information retrieval and apply the semantic importance of the concepts, instead of the term frequency, to representing the contents of a document. Consequently, a formalized document framework is proposed. The document framework is used to express the meaning of a document with the concepts which are expressed by high semantic importance. The framework consists of two parts: the "domain" information and the "situation & background" information of a document. A document-extracting algorithm and a two-stage smoothing method are also proposed. The quantification of the similarity between the query and the document framework depends on the smoothing method. The experiments on the TREC6 collection demonstrate the feasibility and effectiveness of the proposed approach in information retrieval tasks. The average recall level precision of the model using the proposed approach is about 10% higher than that of traditional ones.
基金Supported by the National Natural Science Foundation of China(60575034)Science Foundation of Guangxi Provincial Education Department(200708LX322)~~
文摘Through analyzing syntactic,semantic,pragmatic information,the retrieval system ACIS based on comprehensive information was established,which could achieve personalized information exaction to guide user s information retrieval.
基金Supported by Inner Mongolia Natural Science Fund(20080404MS0507)National Natural Science Fund(30660150)+1 种基金Education Ministry Higher Education School Science Innovation Project Major Program Cultivation Fund Program(707014)Inner Mongolia Natural Scientific Fund Major Program(200607010501)~~
文摘[Objective] The aim was to set up a plant digital information retrieval system.[Method] Plant digital information retrieval system was designed by combining with Microsoft Visual Basic 6.0 Enterprise Edition database management system and Structure Query Language.[Result] The system realized electronic management and retrieval of local plant information.The key words of retrieval included family,genus,formal name,Chinese name,Latin,morphological characteristics,habitat,collection people,collection places,and protect class and so on.[Conclusion] It provided reference for these problems of species identification and digital management of herbarium.
文摘An approximate approach of querying between heterogeneous ontology-basedinformation systems based on an association matrix is proposed. First, the association matrix isdefined to describe relations between concepts in two ontologies. Then, a methodof rewriting queriesbased on the association matrix is presented to solve the ontology heterogeneity problem. Itrewrites the queries in one ontology to approximate queries in another ontology based on thesubsumption relations between concepts. The method also uses vectors to represent queries, and thencomputes the vectors with the association matrix; the disjoint relations between concepts can beconsidered by the results. It can get better approximations than the methods currently in use, whichdonot consider disjoint relations. The method can be processed by machines automatically. It issimple to implement and expected to run quite fast.
文摘How to deal with the imprecise information retrieval has become more and more important in the present information society. An efficient and effective method of information retrieval based on multi tuple rough set is discussed in this paper. The new approach is considered as a generalization of the original rough set model for flexible information retrieval. The imprecise query results can be obtained by multi tuple approximations.
文摘To further enhance the efficiencies of search engines,achieving capabilities of searching,indexing and locating the information in the deep web,latent semantic analysis is a simple and effective way.Through the latent semantic analysis of the attributes in the query interfaces and the unique entrances of the deep web sites,the hidden semantic structure information can be retrieved and dimension reduction can be achieved to a certain extent.Using this semantic structure information,the contents in the site can be inferred and the similarity measures among sites in deep web can be revised.Experimental results show that latent semantic analysis revises and improves the semantic understanding of the query form in the deep web,which overcomes the shortcomings of the keyword-based methods.This approach can be used to effectively search the most similar site for any given site and to obtain a site list which conforms to the restrictions one specifies.
文摘AIM:To compare fluoroscopic, endoscopic and guide wire assistance with ultraslim gastroscopy for placement of nasojejunal feeding tubes. METHODS:The information regarding nasojejunal tube placement procedures was retrieved using the gastrointestinal tract database at Tongji Hospital affiliated to Tongji Medical College. Records from 81 patients who underwent nasojejunal tubes placement by different techniques between 2004 and 2011 were reviewed for procedure success and tube-related outcomes. RESULTS:Nasojejunal feeding tubes were successfully placed in 78 (96.3%) of 81 patients. The success rate by fluoroscopy was 92% (23 of 25), by endoscopic technique 96.3% (26 of 27), and by guide wire assistance (whether via transnasal or transoral insertion)100% (23/23, 6/6). The average time for successful placement was 14.9 ± 2.9 min for fluoroscopic placement, 14.8 ± 4.9 min for endoscopic placement, 11.1 ± 2.2 min for guide wire assistance with transnasal gastroscopic placement, and 14.7 ± 1.2 min for transoral gastroscopic placement. Statistically, the duration for the third method was significantly different (P < 0.05) compared with the other three methods. Transnasal placement over a guidewire was significantly faster (P < 0.05) than any of the other approaches. CONCLUSION:Guide wire assistance with transnasal insertion of nasojejunal feeding tubes represents a safe, quick and effective method for providing enteral nutrition.
文摘The main thrust of this paper is application of a novel data mining approach on the log of user' s feedback to improve web multimedia information retrieval performance. A user space model was constructed based on data mining, and then integrated into the original information space model to improve the accuracy of the new information space model. It can remove clutter and irrelevant text information and help to eliminate mismatch between the page author' s expression and the user' s understanding and expectation. User spacemodel was also utilized to discover the relationship between high-level and low-level features for assigning weight. The authors proposed improved Bayesian algorithm for data mining. Experiment proved that the au-thors' proposed algorithm was efficient.
基金Supported by the Fund for Basic Research of National Non-Profit Research Institutes(No.XK2012-2,ZD2012-7-2)the Fund for Preresearch Project of ISTIC(No.YY201208)
文摘Similarity matching and this paper, a saliency-based information presentation are two matching algorithm is proposed key factors in information retrieval. In for user-oriented search based on the psychological studies on human perception, and major emphasis on the saliently similar aspect of objects to be compared is placed and thus the search result is more agreeable for users. After relevant results are obtained, the cluster-based browsing algorithm is adopted for search result presentation based on social network analysis. By organizing the results in clustered lists, the user can have a general understanding of the whole collection by viewing only a small part of results and locate those of major interest rapidly. Experimental results demonstrate the advantages of the proposed algorithm over the traditional work.
基金Project (No. 2006CB303000) supported in part by the National Basic Research Program (973) of China
文摘This paper explores the application of term dependency in information retrieval (IR) and proposes a novel dependency retrieval model. This retrieval model suggests an extension to the existing language modeling (LM) approach to IR by introducing dependency models for both query and document. Relevance between document and query is then evaluated by reference to the Kullback-Leibler divergence between their dependency models. This paper introduces a novel hybrid dependency structure, which allows integration of various forms of dependency within a single framework. A pseudo relevance feedback based method is also introduced for constructing query dependency model. The basic idea is to use query-relevant top-ranking sentences extracted from the top documents at retrieval time as the augmented representation of query, from which the relationships between query terms are identified. A Markov Random Field (MRF) based approach is presented to ensure the relevance of the extracted sentences, which utilizes the association features between query terms within a sentence to evaluate the relevance of each sentence. This dependency retrieval model was compared with other traditional retrieval models. Experiments indicated that it produces significant improvements in retrieval effectiveness.
基金the High Technology Research and Development Program of China(No.2006AA01Z150)the National Natural Science Foundation of China(No.60435020)
文摘To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new terms co-occurrence representation was put forward by analyzing the process of producingquery.The expansion terms were selected according to their correlation to the whole query.At the sametime,the position information between terms were considered.The experimental result on test retrievalconference(TREC)data collection shows that the method proposed in the paper has made an improve-ment of 5%~19% all the time than the language modeling method without expansion.Compared to thepopular approach of query expansion,pseudo feedback,the precision of the proposed method is competi-tive.
基金Supported by National Basic Research Program of China("973"Program,No.2013cb329304)Natural Science Foundation of China(No.61105072,No.61070044 and No.61111130190)International Joint Research Project"QONTEXT"of the Council of European Union
文摘The incompatible probability represents an important non-classical phenomenon, and it describes conflicting observed marginal probabilities, which cannot be satisfied with a joint probability. First, the incompatibility of random variables was defined and discussed via the non-positive semi-definiteness of their covariance matrixes. Then, a method was proposed to verify the existence of incompatible probability for variables. A hypothesis testing was also applied to reexamine the likelihood of the observed marginal probabilities being integrated into a joint probability space, thus showing the statistical significance of incompatible probability cases. A case study with user click-through data provided the initial evidence of the incompatible probability in information retrieval (IR), particularly in user interaction. The experiments indicate that both incompatible and compatible cases can be found in IR data, and informational queries are more likely to be compatible than navigational queries. The results inspire new theoretical perspectives of modeling the complex interactions and phenomena in IR.