Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and re...Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and realized. The model is an extension of the traditional co-occurrence model by adding a decaying factor that decreases the mutual information when the distance between the terms increases. Experimental results on TREC-9 collections show this query expansion method results in significant improvements over the IR without query expansion.展开更多
This paper focuses on exporting relational data into extensible markup language (XML). First, the characteristics of both relational schemas represented by E-R diagrams and XML document type definitions (DTDs) are an...This paper focuses on exporting relational data into extensible markup language (XML). First, the characteristics of both relational schemas represented by E-R diagrams and XML document type definitions (DTDs) are analyzed. Secondly, the corresponding mapping rules are proposed. At last an algorithm based on edge tables is presented. There are two key points in the algorithm. One is that the edge table is used to store the information of the relational dictionary, and this brings about the efficiency of the algorithm. The other is that structural information can be obtained from the resulting DTDs and other applications can optimize their query processes using the structural information.展开更多
An approximate approach of querying between heterogeneous ontology-basedinformation systems based on an association matrix is proposed. First, the association matrix isdefined to describe relations between concepts in...An approximate approach of querying between heterogeneous ontology-basedinformation systems based on an association matrix is proposed. First, the association matrix isdefined to describe relations between concepts in two ontologies. Then, a methodof rewriting queriesbased on the association matrix is presented to solve the ontology heterogeneity problem. Itrewrites the queries in one ontology to approximate queries in another ontology based on thesubsumption relations between concepts. The method also uses vectors to represent queries, and thencomputes the vectors with the association matrix; the disjoint relations between concepts can beconsidered by the results. It can get better approximations than the methods currently in use, whichdonot consider disjoint relations. The method can be processed by machines automatically. It issimple to implement and expected to run quite fast.展开更多
The incompatible probability represents an important non-classical phenomenon, and it describes conflicting observed marginal probabilities, which cannot be satisfied with a joint probability. First, the incompatibili...The incompatible probability represents an important non-classical phenomenon, and it describes conflicting observed marginal probabilities, which cannot be satisfied with a joint probability. First, the incompatibility of random variables was defined and discussed via the non-positive semi-definiteness of their covariance matrixes. Then, a method was proposed to verify the existence of incompatible probability for variables. A hypothesis testing was also applied to reexamine the likelihood of the observed marginal probabilities being integrated into a joint probability space, thus showing the statistical significance of incompatible probability cases. A case study with user click-through data provided the initial evidence of the incompatible probability in information retrieval (IR), particularly in user interaction. The experiments indicate that both incompatible and compatible cases can be found in IR data, and informational queries are more likely to be compatible than navigational queries. The results inspire new theoretical perspectives of modeling the complex interactions and phenomena in IR.展开更多
To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new t...To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new terms co-occurrence representation was put forward by analyzing the process of producingquery.The expansion terms were selected according to their correlation to the whole query.At the sametime,the position information between terms were considered.The experimental result on test retrievalconference(TREC)data collection shows that the method proposed in the paper has made an improve-ment of 5%~19% all the time than the language modeling method without expansion.Compared to thepopular approach of query expansion,pseudo feedback,the precision of the proposed method is competi-tive.展开更多
HashQuery,a Hash-area-based data dissemination protocol,was designed in wireless sensor networks. Using a Hash function which uses time as the key,both mobile sinks and sensors can determine the same Hash area. The se...HashQuery,a Hash-area-based data dissemination protocol,was designed in wireless sensor networks. Using a Hash function which uses time as the key,both mobile sinks and sensors can determine the same Hash area. The sensors can send the information about the events that they monitor to the Hash area and the mobile sinks need only to query that area instead of flooding among the whole network,and thus much energy can be saved. In addition,the location of the Hash area changes over time so as to balance the energy consumption in the whole network. Theoretical analysis shows that the proposed protocol can be energy-efficient and simulation studies further show that when there are 5 sources and 5 sinks in the network,it can save at least 50% energy compared with the existing two-tier data dissemination(TTDD) protocol,especially in large-scale wireless sensor networks.展开更多
A novel technique called the bitmap lattice index(BLI) is proposed, which combines the advantages of a wireless broadcasting environment with a road network. Existing road networks are based on the on-demand method: a...A novel technique called the bitmap lattice index(BLI) is proposed, which combines the advantages of a wireless broadcasting environment with a road network. Existing road networks are based on the on-demand method: a server's workload increases as the query request increases when a server sends a client information. To solve this problem, we propose the BLI. The BLI denotes an object and a node as 0 and 1 in the Hilbert curve(HC) map. The BLI can identify the position of a node and an object through bit information; it can also reduce the broadcasting frequency of a server by reducing the size of the index, thereby decreasing the access latency and query processing times. Moreover, the BLI is highly effective for data filtering, as it can identify the positions of both an object and a node. In a road network, if filtering is done via the Euclidean distance, it may result in an error. To prevent this, we add another validation procedure. The experiment is conducted by applying the BLI to kNN query, and the technique is assessed by a performance evaluation experiment.展开更多
Time is an important dimension for information in the geographical information system. Data, such as the historical state of target property space and related events causing the state to be changed, should be saved as...Time is an important dimension for information in the geographical information system. Data, such as the historical state of target property space and related events causing the state to be changed, should be saved as important files. This should be applied to property management. This paper designs and constructs a spatial temporal model, which is suitable to the property data changing management and spatial temporal query by analyzing the basic types and characteristics of property management spatial changing time and date. This model uses current and historical situational layers to organize and set up the relationship between current situation data and historical dates according to spatial temporal topological relations in property entities. By using Map Basic, housing property management and spatial query is realized.展开更多
The present paper describes the use of online free language resources for translating and expanding queries in CLIR (cross-language information retrieval). In a previous study, we proposed method queries that were t...The present paper describes the use of online free language resources for translating and expanding queries in CLIR (cross-language information retrieval). In a previous study, we proposed method queries that were translated by two machine translation systems on the Language Gridem. The queries were then expanded using an online dictionary to translate compound words or word phrases. A concept base was used to compare back translation words with the original query in order to delete mistranslated words. In order to evaluate the proposed method, we constructed a CLIR system and used the science documents of the NTCIR1 dataset. The proposed method achieved high precision. However~ proper nouns (names of people and places) appear infrequently in science documents. In information retrieval, proper nouns present unique problems. Since proper nouns are usually unknown words, they are difficult to find in monolingual dictionaries, not to mention bilingual dictionaries. Furthermore, the initial query of the user is not always the best description of the desired information. In order to solve this problem, and to create a better query representation, query expansion is often proposed as a solution. Wikipedia was used to translate compound words or word phrases. It was also used to expand queries together with a concept base. The NTCIRI and NTCIR 6 datasets were used to evaluate the proposed method. In the proposed method, the CLIR system was implemented with a high rate of precision. The proposed syst had a higher ranking than the NTCIRI and NTCIR6 participation systems.展开更多
In this paper, the problems of redundant traffic and redundant replicas tor efficient object replication in P2P overlay are studies. Firstly, a hierarchical and topology-aware P2P overlay is developed with κ-Medoids ...In this paper, the problems of redundant traffic and redundant replicas tor efficient object replication in P2P overlay are studies. Firstly, a hierarchical and topology-aware P2P overlay is developed with κ-Medoids partition algorithm to achieve the minimal physical distance of all super peer pairs. Secondly, a new idea of placing at most one replica in a cluster of physically adjacent nodes is introduced to achieve scattered distribution of replicas. Lastly, an efficient replicas-query algorithm based on multiple hash functions is proposed. Theoretical analysis and simulation experiment on several performance metrics are given, and it is verified that the method in this paper can efficiently disseminate replicas across the network, increase query-hit ratio, and decrease redundant messages and storage spaces required.展开更多
The arrival of the era of the Internet has brought about the rapid dissemination and spread of a big amount of the information and data. At present, we are surrounded by all kinds of the information, but the rich and ...The arrival of the era of the Internet has brought about the rapid dissemination and spread of a big amount of the information and data. At present, we are surrounded by all kinds of the information, but the rich and diversified information resources also brought about the chaos, so that the query of the messages is no way to start. In fact, the information resources can provide us with more convenience, but we have to spend a lot of energy to organize and filter the information, and the costs and the time of the investment are immeasurable. Usually, the information we want to query is often easy to understand, and the information design uses the more intuitive and vivid computing means to achieve the visualization of the big data, in order to reflect the beauty of the big data.展开更多
文摘Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and realized. The model is an extension of the traditional co-occurrence model by adding a decaying factor that decreases the mutual information when the distance between the terms increases. Experimental results on TREC-9 collections show this query expansion method results in significant improvements over the IR without query expansion.
文摘This paper focuses on exporting relational data into extensible markup language (XML). First, the characteristics of both relational schemas represented by E-R diagrams and XML document type definitions (DTDs) are analyzed. Secondly, the corresponding mapping rules are proposed. At last an algorithm based on edge tables is presented. There are two key points in the algorithm. One is that the edge table is used to store the information of the relational dictionary, and this brings about the efficiency of the algorithm. The other is that structural information can be obtained from the resulting DTDs and other applications can optimize their query processes using the structural information.
文摘An approximate approach of querying between heterogeneous ontology-basedinformation systems based on an association matrix is proposed. First, the association matrix isdefined to describe relations between concepts in two ontologies. Then, a methodof rewriting queriesbased on the association matrix is presented to solve the ontology heterogeneity problem. Itrewrites the queries in one ontology to approximate queries in another ontology based on thesubsumption relations between concepts. The method also uses vectors to represent queries, and thencomputes the vectors with the association matrix; the disjoint relations between concepts can beconsidered by the results. It can get better approximations than the methods currently in use, whichdonot consider disjoint relations. The method can be processed by machines automatically. It issimple to implement and expected to run quite fast.
基金Supported by National Basic Research Program of China("973"Program,No.2013cb329304)Natural Science Foundation of China(No.61105072,No.61070044 and No.61111130190)International Joint Research Project"QONTEXT"of the Council of European Union
文摘The incompatible probability represents an important non-classical phenomenon, and it describes conflicting observed marginal probabilities, which cannot be satisfied with a joint probability. First, the incompatibility of random variables was defined and discussed via the non-positive semi-definiteness of their covariance matrixes. Then, a method was proposed to verify the existence of incompatible probability for variables. A hypothesis testing was also applied to reexamine the likelihood of the observed marginal probabilities being integrated into a joint probability space, thus showing the statistical significance of incompatible probability cases. A case study with user click-through data provided the initial evidence of the incompatible probability in information retrieval (IR), particularly in user interaction. The experiments indicate that both incompatible and compatible cases can be found in IR data, and informational queries are more likely to be compatible than navigational queries. The results inspire new theoretical perspectives of modeling the complex interactions and phenomena in IR.
基金the High Technology Research and Development Program of China(No.2006AA01Z150)the National Natural Science Foundation of China(No.60435020)
文摘To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new terms co-occurrence representation was put forward by analyzing the process of producingquery.The expansion terms were selected according to their correlation to the whole query.At the sametime,the position information between terms were considered.The experimental result on test retrievalconference(TREC)data collection shows that the method proposed in the paper has made an improve-ment of 5%~19% all the time than the language modeling method without expansion.Compared to thepopular approach of query expansion,pseudo feedback,the precision of the proposed method is competi-tive.
基金Project(07JJ1010) supported by Hunan Provincial Natural Science Foundation of ChinaProjects(2006AA01Z202, 2006AA01Z199) supported by the National High-Tech Research and Development Program of China+2 种基金Project(7002102) supported by the City University of Hong Kong, Strategic Research Grant (SRG)Project(IRT-0661) supported by the Program for Changjiang Scholars and Innovative Research Team in UniversityProject(NCET-06-0686) supported by the Program for New Century Excellent Talents in University
文摘HashQuery,a Hash-area-based data dissemination protocol,was designed in wireless sensor networks. Using a Hash function which uses time as the key,both mobile sinks and sensors can determine the same Hash area. The sensors can send the information about the events that they monitor to the Hash area and the mobile sinks need only to query that area instead of flooding among the whole network,and thus much energy can be saved. In addition,the location of the Hash area changes over time so as to balance the energy consumption in the whole network. Theoretical analysis shows that the proposed protocol can be energy-efficient and simulation studies further show that when there are 5 sources and 5 sinks in the network,it can save at least 50% energy compared with the existing two-tier data dissemination(TTDD) protocol,especially in large-scale wireless sensor networks.
基金supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF2013R1A1A1004593, 2013R1A1A1A05012348)
文摘A novel technique called the bitmap lattice index(BLI) is proposed, which combines the advantages of a wireless broadcasting environment with a road network. Existing road networks are based on the on-demand method: a server's workload increases as the query request increases when a server sends a client information. To solve this problem, we propose the BLI. The BLI denotes an object and a node as 0 and 1 in the Hilbert curve(HC) map. The BLI can identify the position of a node and an object through bit information; it can also reduce the broadcasting frequency of a server by reducing the size of the index, thereby decreasing the access latency and query processing times. Moreover, the BLI is highly effective for data filtering, as it can identify the positions of both an object and a node. In a road network, if filtering is done via the Euclidean distance, it may result in an error. To prevent this, we add another validation procedure. The experiment is conducted by applying the BLI to kNN query, and the technique is assessed by a performance evaluation experiment.
文摘Time is an important dimension for information in the geographical information system. Data, such as the historical state of target property space and related events causing the state to be changed, should be saved as important files. This should be applied to property management. This paper designs and constructs a spatial temporal model, which is suitable to the property data changing management and spatial temporal query by analyzing the basic types and characteristics of property management spatial changing time and date. This model uses current and historical situational layers to organize and set up the relationship between current situation data and historical dates according to spatial temporal topological relations in property entities. By using Map Basic, housing property management and spatial query is realized.
文摘The present paper describes the use of online free language resources for translating and expanding queries in CLIR (cross-language information retrieval). In a previous study, we proposed method queries that were translated by two machine translation systems on the Language Gridem. The queries were then expanded using an online dictionary to translate compound words or word phrases. A concept base was used to compare back translation words with the original query in order to delete mistranslated words. In order to evaluate the proposed method, we constructed a CLIR system and used the science documents of the NTCIR1 dataset. The proposed method achieved high precision. However~ proper nouns (names of people and places) appear infrequently in science documents. In information retrieval, proper nouns present unique problems. Since proper nouns are usually unknown words, they are difficult to find in monolingual dictionaries, not to mention bilingual dictionaries. Furthermore, the initial query of the user is not always the best description of the desired information. In order to solve this problem, and to create a better query representation, query expansion is often proposed as a solution. Wikipedia was used to translate compound words or word phrases. It was also used to expand queries together with a concept base. The NTCIRI and NTCIR 6 datasets were used to evaluate the proposed method. In the proposed method, the CLIR system was implemented with a high rate of precision. The proposed syst had a higher ranking than the NTCIRI and NTCIR6 participation systems.
基金Supported by the National Natural Science Foundation of China ( No. 60903195 ) and the Key Technological Problems Tackling Project of Wuhan ( No. 200750499172).
文摘In this paper, the problems of redundant traffic and redundant replicas tor efficient object replication in P2P overlay are studies. Firstly, a hierarchical and topology-aware P2P overlay is developed with κ-Medoids partition algorithm to achieve the minimal physical distance of all super peer pairs. Secondly, a new idea of placing at most one replica in a cluster of physically adjacent nodes is introduced to achieve scattered distribution of replicas. Lastly, an efficient replicas-query algorithm based on multiple hash functions is proposed. Theoretical analysis and simulation experiment on several performance metrics are given, and it is verified that the method in this paper can efficiently disseminate replicas across the network, increase query-hit ratio, and decrease redundant messages and storage spaces required.
文摘The arrival of the era of the Internet has brought about the rapid dissemination and spread of a big amount of the information and data. At present, we are surrounded by all kinds of the information, but the rich and diversified information resources also brought about the chaos, so that the query of the messages is no way to start. In fact, the information resources can provide us with more convenience, but we have to spend a lot of energy to organize and filter the information, and the costs and the time of the investment are immeasurable. Usually, the information we want to query is often easy to understand, and the information design uses the more intuitive and vivid computing means to achieve the visualization of the big data, in order to reflect the beauty of the big data.