Operation control of power systems has become challenging with an increase in the scale and complexity of power distribution systems and extensive access to renewable energy.Therefore,improvement of the ability of dat...Operation control of power systems has become challenging with an increase in the scale and complexity of power distribution systems and extensive access to renewable energy.Therefore,improvement of the ability of data-driven operation management,intelligent analysis,and mining is urgently required.To investigate and explore similar regularities of the historical operating section of the power distribution system and assist the power grid in obtaining high-value historical operation,maintenance experience,and knowledge by rule and line,a neural information retrieval model with an attention mechanism is proposed based on graph data computing technology.Based on the processing flow of the operating data of the power distribution system,a technical framework of neural information retrieval is established.Combined with the natural graph characteristics of the power distribution system,a unified graph data structure and a data fusion method of data access,data complement,and multi-source data are constructed.Further,a graph node feature-embedding representation learning algorithm and a neural information retrieval algorithm model are constructed.The neural information retrieval algorithm model is trained and tested using the generated graph node feature representation vector set.The model is verified on the operating section of the power distribution system of a provincial grid area.The results show that the proposed method demonstrates high accuracy in the similarity matching of historical operation characteristics and effectively supports intelligent fault diagnosis and elimination in power distribution systems.展开更多
With the rapid increment of the information on the web, traditional information retrieval based on the keywords is far from user's satisfaction in recall and precision. In order to improve the recall ratio and the pr...With the rapid increment of the information on the web, traditional information retrieval based on the keywords is far from user's satisfaction in recall and precision. In order to improve the recall ratio and the precision radio of IR engine in the vegetables e-commerce, an information retrieval model based on the vegetables e-commerce ontology is presented in this paper, vegetables e-commerce ontology was constructed by gathering and the analyzing vegetables e-commerce domain information on the web. The vegetables e-commerce ontology is composed of some kinds of vegetable classes and hierarchy relationship of vegetables classes. In the process of information retrieval, domain ontology helps to index information and information inference. An ontology-based information retrieval model is implemented, and which has more functions than the keyword-based web information retrieval engines. The experiment results show that the recall ratio and the precision ratio of ontology-based information retrieval model are higher than that of the information retrieval engine based on keyword at a certain extent.展开更多
A hybrid model that is based on the Combination of keywords and concept was put forward. The hybrid model is built on vector space model and probabilistic reasoning network. It not only can exert the advantages of key...A hybrid model that is based on the Combination of keywords and concept was put forward. The hybrid model is built on vector space model and probabilistic reasoning network. It not only can exert the advantages of keywords retrieval and concept retrieval but also can compensate for their shortcomings. Their parameters can be adjusted according to different usage in order to accept the best information retrieval result, and it has been proved by our experiments.展开更多
A kind of single linked lists named aggregative chain is introduced to the algorithm, thus improving the architecture of FP tree. The new FP tree is a one-way tree and only the pointers that point its parent at each n...A kind of single linked lists named aggregative chain is introduced to the algorithm, thus improving the architecture of FP tree. The new FP tree is a one-way tree and only the pointers that point its parent at each node are kept. Route information of different nodes in a same item are compressed into aggregative chains so that the frequent patterns will be produced in aggregative chains without generating node links and conditional pattern bases. An example of Web key words retrieval is given to analyze and verify the frequent pattern algorithm in this paper.展开更多
A new information search model is reported and the design and implementation of a system based on intelligent agent is presented. The system is an assistant information retrieval system which helps users to search wha...A new information search model is reported and the design and implementation of a system based on intelligent agent is presented. The system is an assistant information retrieval system which helps users to search what they need. The system consists of four main components: interface agent, information retrieval agent, broker agent and learning agent. They collaborate to implement system functions. The agents apply learning mechanisms based on an improved ID3 algorithm.展开更多
Grating-based X-ray phase contrast imaging has been demonstrated to he an extremely powerful phase-sensitive imaging technique. By using two-dimensional (2D) gratings, the observable contrast is extended to two refr...Grating-based X-ray phase contrast imaging has been demonstrated to he an extremely powerful phase-sensitive imaging technique. By using two-dimensional (2D) gratings, the observable contrast is extended to two refraction directions. Recently, we have developed a novel reverse-projection (RP) method, which is capable of retrieving the object information efficiently with one-dimensional (1D) grating-based phase contrast imaging. In this contribution, we present its extension to the 2D grating-based X-ray phase contrast imaging, named the two-dimensional reverse- projection (2D-RP) method, for information retrieval. The method takes into account the nonlinear contributions of two refraction directions and allows the retrieval of the absorption, the horizontal and the vertical refraction images. The obtained information can be used for the reconstruction of the three-dimensionak phase gradient field, and for an improved phase map retrieval and reconstruction. Numerical experiments are carried out, and the results confirm the validity of the 2D-RP method.展开更多
In this paper, we employ genetic algorithms to solve the migration problem (MP). We propose a new encoding scheme to represent trees, which is composed of two parts: the pre-ordered traversal sequence of tree vertices...In this paper, we employ genetic algorithms to solve the migration problem (MP). We propose a new encoding scheme to represent trees, which is composed of two parts: the pre-ordered traversal sequence of tree vertices and the children number sequence of corresponding tree vertices. The proposed encoding scheme has the advantages of simplicity for encoding and decoding, ease for GA operations, and better equilibrium between exploration and exploitation. It is also adaptive in that, with few restrictions on the length of code, it can be freely lengthened or shortened according to the characteristics of the problem space. Furthermore, the encoding scheme is highly applicable to the degree-constrained minimum spanning tree problem because it also contains the degree information of each node. The simulation results demonstrate the higher performance of our algorithm, with fast convergence to the optima or sub-optima on various problem sizes. Comparing with the binary string encoding of vertices, when the problem size is large, our algorithm runs remarkably faster with comparable search capability. Key words distributed information retrieval - mobile agents - migration problem - genetic algorithms CLC number TP 301. 6 Foundation item: Supported by the National Natural Science Foundation of China (90104005), the Natural Science Foundation of Hubei Province and the Hong Kong Polytechnic University under the grant G-YD63Biography: He Yan-xiang (1952-), male, Professor, research direction: distributed and parallel processing, multi-agent systems, data mining and e-business.展开更多
The drastic growth of coastal observation sensors results in copious data that provide weather information.The intricacies in sensor-generated big data are heterogeneity and interpretation,driving high-end Information...The drastic growth of coastal observation sensors results in copious data that provide weather information.The intricacies in sensor-generated big data are heterogeneity and interpretation,driving high-end Information Retrieval(IR)systems.The Semantic Web(SW)can solve this issue by integrating data into a single platform for information exchange and knowledge retrieval.This paper focuses on exploiting the SWbase systemto provide interoperability through ontologies by combining the data concepts with ontology classes.This paper presents a 4-phase weather data model:data processing,ontology creation,SW processing,and query engine.The developed Oceanographic Weather Ontology helps to enhance data analysis,discovery,IR,and decision making.In addition to that,it also evaluates the developed ontology with other state-of-the-art ontologies.The proposed ontology’s quality has improved by 39.28%in terms of completeness,and structural complexity has decreased by 45.29%,11%and 37.7%in Precision and Accuracy.Indian Meteorological Satellite INSAT-3D’s ocean data is a typical example of testing the proposed model.The experimental result shows the effectiveness of the proposed data model and its advantages in machine understanding and IR.展开更多
To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new t...To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new terms co-occurrence representation was put forward by analyzing the process of producingquery.The expansion terms were selected according to their correlation to the whole query.At the sametime,the position information between terms were considered.The experimental result on test retrievalconference(TREC)data collection shows that the method proposed in the paper has made an improve-ment of 5%~19% all the time than the language modeling method without expansion.Compared to thepopular approach of query expansion,pseudo feedback,the precision of the proposed method is competi-tive.展开更多
The major problem of the most current approaches of information models lies in that individual words provide unreliable evidence about the content of the texts. When the document is short, e.g. only the ab-stract is a...The major problem of the most current approaches of information models lies in that individual words provide unreliable evidence about the content of the texts. When the document is short, e.g. only the ab-stract is available, the word-use variability problem will have substantial impact on the Information Retrieval (IR) performance. To solve the problem, a new technology to short document retrieval named Reference Document Model (RDM) is put forward in this letter. RDM gets the statistical semantic of the query/document by pseudo feedback both for the query and document from reference documents. The contributions of this model are three-fold: (1) Pseudo feedback both for the query and the document; (2) Building the query model and the document model from reference documents; (3) Flexible indexing units, which can be any linguistic elements such as documents, paragraphs, sentences, n-grams, term or character. For short document retrieval, RDM achieves significant improvements over the classical probabilistic models on the task of ad hoc retrieval on Text REtrieval Conference (TREC) test sets. Results also show that the shorter the document, the better the RDM performance.展开更多
This paper presents a new integrated information retrieval support system (IIRSS) which can help Web search engines retrieve cross-lingual information from hereto geneous resources stored in multi-databases in Intra...This paper presents a new integrated information retrieval support system (IIRSS) which can help Web search engines retrieve cross-lingual information from hereto geneous resources stored in multi-databases in Intranet. The IIRSS, with a three-layer architecture, can cooperate with other application servers running in Intranet. By using intelligent agents to collect information and to create indexes on the-fly, using an access control strategy to confine a user to browsing those accessible documents for him/her through a single portal, and using a new cross-lingual translation tool to help the search engine retrieve documents, the new system provides controllable information access with different authorizations, personalized services, and real-time information retrieval.展开更多
The incompatible probability represents an important non-classical phenomenon, and it describes conflicting observed marginal probabilities, which cannot be satisfied with a joint probability. First, the incompatibili...The incompatible probability represents an important non-classical phenomenon, and it describes conflicting observed marginal probabilities, which cannot be satisfied with a joint probability. First, the incompatibility of random variables was defined and discussed via the non-positive semi-definiteness of their covariance matrixes. Then, a method was proposed to verify the existence of incompatible probability for variables. A hypothesis testing was also applied to reexamine the likelihood of the observed marginal probabilities being integrated into a joint probability space, thus showing the statistical significance of incompatible probability cases. A case study with user click-through data provided the initial evidence of the incompatible probability in information retrieval (IR), particularly in user interaction. The experiments indicate that both incompatible and compatible cases can be found in IR data, and informational queries are more likely to be compatible than navigational queries. The results inspire new theoretical perspectives of modeling the complex interactions and phenomena in IR.展开更多
This letter presents a new discriminative model for Information Retrieval (IR), referred to as Ordinal Regression Model (ORM). ORM is different from most existing models in that it views IR as ordinal regression probl...This letter presents a new discriminative model for Information Retrieval (IR), referred to as Ordinal Regression Model (ORM). ORM is different from most existing models in that it views IR as ordinal regression problem (i.e. ranking problem) instead of binary classification. It is noted that the task of IR is to rank documents according to the user information needed, so IR can be viewed as ordinal regression problem. Two parameter learning algorithms for ORM are presented. One is a perceptron-based algorithm. The other is the ranking Support Vector Machine (SVM). The effec- tiveness of the proposed approach has been evaluated on the task of ad hoc retrieval using three English Text REtrieval Conference (TREC) sets and two Chinese TREC sets. Results show that ORM sig- nificantly outperforms the state-of-the-art language model approaches and OKAPI system in all test sets; and it is more appropriate to view IR as ordinal regression other than binary classification.展开更多
We cleveloped a high-speed information retrieval system. The system hased on the IXP 2800 is one of the dedicute device. The velocity of the information retrieval is 6.8 Gb/s. The protocol support Telnet, FTP, SMTP, P...We cleveloped a high-speed information retrieval system. The system hased on the IXP 2800 is one of the dedicute device. The velocity of the information retrieval is 6.8 Gb/s. The protocol support Telnet, FTP, SMTP, POP3 etc. various networks protocols. The information retrieval supports the key word and the natural language process. This paper explains the hardware system, software system and the index of the performance. Key words network processor - IXP2800 - information retrieval - IXA CLC number TP 309 Foundation item: Supported by the National Natural Science Foundation of China (69873016 & 69972017) and the National High Technology Development Program of China (863-301-06-1)Biography: SHI Shu-dong (1963-), male, Ph. D. candidate, research direction: network & information security.展开更多
Decision Support Systems(DSS)are man-machine interaction systems,which support the de-cision-makers to solve the unstructured and semi-structured decisions,this paper advances that thefunction of problem-oriented info...Decision Support Systems(DSS)are man-machine interaction systems,which support the de-cision-makers to solve the unstructured and semi-structured decisions,this paper advances that thefunction of problem-oriented information retrieval DSS can meet the needs of enterprise’s topmanagement effectively in comparison with other information retrieval functions,in accordancewith the features of supporting information for decision.An architecture of this system is presented,which dissolves a problem put forward or recognized by the user into the problem recognized by thecomputer,forming retrieval tactics and searching the data the user needs.Designed and developedaccording to the architecture of this system,a prototype system is introduced,which is CF Econom-ic Environment Information Retrieval DSS.展开更多
OOV term translation plays an important role in natural language processing. Although many researchers in the past have endeavored to solve the OOV term translation problems, but none existing methods offer definition...OOV term translation plays an important role in natural language processing. Although many researchers in the past have endeavored to solve the OOV term translation problems, but none existing methods offer definition or context information of OOV terms. Furthermore, non-existing methods focus on cross-language definition retrieval for OOV terms. Never the less, it has always been so difficult to evaluate the correctness of an OOV term translation without domain specific knowledge and correct references. Our English definition ranking method differentiate the types of OOV terms, and applies different methods for translation extraction. Our English definition ranking method also extracts multilingual context information and monolingual definitions of OOV terms. In addition, we propose a novel cross-language definition retrieval system for OOV terms. Never the less, we propose an auto re-evaluation method to evaluate the correctness of OOV translations and definitions. Our methods achieve high performances against existing methods.展开更多
Information retrieval (IR) systems are designed to help information seekers retrieving relevant information from vast document. The need for relevant information from a vast amount of document gave birth to IR systems...Information retrieval (IR) systems are designed to help information seekers retrieving relevant information from vast document. The need for relevant information from a vast amount of document gave birth to IR systems. Even though different IR systems exist, they cannot meet all users’ expectations. A different level of users’ knowledge makes queries to be expressed in different ways. As a result, the system may miss the core meaning of users query and retrieve dissatisfactory results. This happens mainly because of the ambiguities of words involved in the natural languages and expression mismatch among users and authors. The existing ambiguities in Amharic language have negative impacts on the performance of Amharic IR system. Some of the ambiguities for this type of problem are: spelling variants of the same word, polysemous and synonymous terms. If users are not fully knowledgeable about the information domain area, they will mostly formulate weak queries to retrieve documents. Thus, they end up frustrated with the results found from an IR system. This research has been conducted, aiming at augmenting the recall of previous work. Statistical co-occurrence technique has been used in order to expand query terms. The main reason for performing query expansion is to provide relevant documents as per users’ query that can satisfy their information need. Statistical co-occurrence method considers, frequently appearing terms with the query term, regardless of their position. The efficiency of proposed technique has been tested on the prototype system and the result found compared with the result of previous study. Accordingly, 6% recall and 2% f-measure improvement has been made. Hence, the statistical co-occurrence method outperformed the bi-gram based IR system.展开更多
基金supported by the National Key R&D Program of China(2020YFB0905900).
文摘Operation control of power systems has become challenging with an increase in the scale and complexity of power distribution systems and extensive access to renewable energy.Therefore,improvement of the ability of data-driven operation management,intelligent analysis,and mining is urgently required.To investigate and explore similar regularities of the historical operating section of the power distribution system and assist the power grid in obtaining high-value historical operation,maintenance experience,and knowledge by rule and line,a neural information retrieval model with an attention mechanism is proposed based on graph data computing technology.Based on the processing flow of the operating data of the power distribution system,a technical framework of neural information retrieval is established.Combined with the natural graph characteristics of the power distribution system,a unified graph data structure and a data fusion method of data access,data complement,and multi-source data are constructed.Further,a graph node feature-embedding representation learning algorithm and a neural information retrieval algorithm model are constructed.The neural information retrieval algorithm model is trained and tested using the generated graph node feature representation vector set.The model is verified on the operating section of the power distribution system of a provincial grid area.The results show that the proposed method demonstrates high accuracy in the similarity matching of historical operation characteristics and effectively supports intelligent fault diagnosis and elimination in power distribution systems.
基金supported by the National High Technology Research and Development Program of China(2006AA10Z239)
文摘With the rapid increment of the information on the web, traditional information retrieval based on the keywords is far from user's satisfaction in recall and precision. In order to improve the recall ratio and the precision radio of IR engine in the vegetables e-commerce, an information retrieval model based on the vegetables e-commerce ontology is presented in this paper, vegetables e-commerce ontology was constructed by gathering and the analyzing vegetables e-commerce domain information on the web. The vegetables e-commerce ontology is composed of some kinds of vegetable classes and hierarchy relationship of vegetables classes. In the process of information retrieval, domain ontology helps to index information and information inference. An ontology-based information retrieval model is implemented, and which has more functions than the keyword-based web information retrieval engines. The experiment results show that the recall ratio and the precision ratio of ontology-based information retrieval model are higher than that of the information retrieval engine based on keyword at a certain extent.
文摘A hybrid model that is based on the Combination of keywords and concept was put forward. The hybrid model is built on vector space model and probabilistic reasoning network. It not only can exert the advantages of keywords retrieval and concept retrieval but also can compensate for their shortcomings. Their parameters can be adjusted according to different usage in order to accept the best information retrieval result, and it has been proved by our experiments.
基金Supported by the Natural Science Foundation ofLiaoning Province (20042020)
文摘A kind of single linked lists named aggregative chain is introduced to the algorithm, thus improving the architecture of FP tree. The new FP tree is a one-way tree and only the pointers that point its parent at each node are kept. Route information of different nodes in a same item are compressed into aggregative chains so that the frequent patterns will be produced in aggregative chains without generating node links and conditional pattern bases. An example of Web key words retrieval is given to analyze and verify the frequent pattern algorithm in this paper.
文摘A new information search model is reported and the design and implementation of a system based on intelligent agent is presented. The system is an assistant information retrieval system which helps users to search what they need. The system consists of four main components: interface agent, information retrieval agent, broker agent and learning agent. They collaborate to implement system functions. The agents apply learning mechanisms based on an improved ID3 algorithm.
基金Project supported by the Knowledge Innovation Program of the Chinese Academy of Sciences (Grant No.KJCX2-YW-N42)the Key Project of the National Natural Science Foundation of China (Grant No.10734070)+3 种基金the National Natural Science Foundation of China (Grant No.11205157)the National Basic Research Program of China (Grant Nos. 2009CB930804 and 2012CB825800)the Fundamental Research Funds for the Central Universities,China (Grant No. WK2310000021)the China Postdoctoral Science Foundation (Grant No. 2011M501064)
文摘Grating-based X-ray phase contrast imaging has been demonstrated to he an extremely powerful phase-sensitive imaging technique. By using two-dimensional (2D) gratings, the observable contrast is extended to two refraction directions. Recently, we have developed a novel reverse-projection (RP) method, which is capable of retrieving the object information efficiently with one-dimensional (1D) grating-based phase contrast imaging. In this contribution, we present its extension to the 2D grating-based X-ray phase contrast imaging, named the two-dimensional reverse- projection (2D-RP) method, for information retrieval. The method takes into account the nonlinear contributions of two refraction directions and allows the retrieval of the absorption, the horizontal and the vertical refraction images. The obtained information can be used for the reconstruction of the three-dimensionak phase gradient field, and for an improved phase map retrieval and reconstruction. Numerical experiments are carried out, and the results confirm the validity of the 2D-RP method.
文摘In this paper, we employ genetic algorithms to solve the migration problem (MP). We propose a new encoding scheme to represent trees, which is composed of two parts: the pre-ordered traversal sequence of tree vertices and the children number sequence of corresponding tree vertices. The proposed encoding scheme has the advantages of simplicity for encoding and decoding, ease for GA operations, and better equilibrium between exploration and exploitation. It is also adaptive in that, with few restrictions on the length of code, it can be freely lengthened or shortened according to the characteristics of the problem space. Furthermore, the encoding scheme is highly applicable to the degree-constrained minimum spanning tree problem because it also contains the degree information of each node. The simulation results demonstrate the higher performance of our algorithm, with fast convergence to the optima or sub-optima on various problem sizes. Comparing with the binary string encoding of vertices, when the problem size is large, our algorithm runs remarkably faster with comparable search capability. Key words distributed information retrieval - mobile agents - migration problem - genetic algorithms CLC number TP 301. 6 Foundation item: Supported by the National Natural Science Foundation of China (90104005), the Natural Science Foundation of Hubei Province and the Hong Kong Polytechnic University under the grant G-YD63Biography: He Yan-xiang (1952-), male, Professor, research direction: distributed and parallel processing, multi-agent systems, data mining and e-business.
基金This work is financially supported by the Ministry of Earth Science(MoES),Government of India,(Grant.No.MoES/36/OOIS/Extra/45/2015),URL:https://www.moes.gov.in。
文摘The drastic growth of coastal observation sensors results in copious data that provide weather information.The intricacies in sensor-generated big data are heterogeneity and interpretation,driving high-end Information Retrieval(IR)systems.The Semantic Web(SW)can solve this issue by integrating data into a single platform for information exchange and knowledge retrieval.This paper focuses on exploiting the SWbase systemto provide interoperability through ontologies by combining the data concepts with ontology classes.This paper presents a 4-phase weather data model:data processing,ontology creation,SW processing,and query engine.The developed Oceanographic Weather Ontology helps to enhance data analysis,discovery,IR,and decision making.In addition to that,it also evaluates the developed ontology with other state-of-the-art ontologies.The proposed ontology’s quality has improved by 39.28%in terms of completeness,and structural complexity has decreased by 45.29%,11%and 37.7%in Precision and Accuracy.Indian Meteorological Satellite INSAT-3D’s ocean data is a typical example of testing the proposed model.The experimental result shows the effectiveness of the proposed data model and its advantages in machine understanding and IR.
基金the High Technology Research and Development Program of China(No.2006AA01Z150)the National Natural Science Foundation of China(No.60435020)
文摘To eliminate the mismatch between words of relevant documents and user's query and more seriousnegative effects it has on the performance of information retrieval,a method of query expansion on the ba-sis of new terms co-occurrence representation was put forward by analyzing the process of producingquery.The expansion terms were selected according to their correlation to the whole query.At the sametime,the position information between terms were considered.The experimental result on test retrievalconference(TREC)data collection shows that the method proposed in the paper has made an improve-ment of 5%~19% all the time than the language modeling method without expansion.Compared to thepopular approach of query expansion,pseudo feedback,the precision of the proposed method is competi-tive.
基金Supported by the Funds of Heilongjiang Outstanding Young Teacher (1151G037).
文摘The major problem of the most current approaches of information models lies in that individual words provide unreliable evidence about the content of the texts. When the document is short, e.g. only the ab-stract is available, the word-use variability problem will have substantial impact on the Information Retrieval (IR) performance. To solve the problem, a new technology to short document retrieval named Reference Document Model (RDM) is put forward in this letter. RDM gets the statistical semantic of the query/document by pseudo feedback both for the query and document from reference documents. The contributions of this model are three-fold: (1) Pseudo feedback both for the query and the document; (2) Building the query model and the document model from reference documents; (3) Flexible indexing units, which can be any linguistic elements such as documents, paragraphs, sentences, n-grams, term or character. For short document retrieval, RDM achieves significant improvements over the classical probabilistic models on the task of ad hoc retrieval on Text REtrieval Conference (TREC) test sets. Results also show that the shorter the document, the better the RDM performance.
基金Supported by the National Natural Science Foun-dation of China (60173010)
文摘This paper presents a new integrated information retrieval support system (IIRSS) which can help Web search engines retrieve cross-lingual information from hereto geneous resources stored in multi-databases in Intranet. The IIRSS, with a three-layer architecture, can cooperate with other application servers running in Intranet. By using intelligent agents to collect information and to create indexes on the-fly, using an access control strategy to confine a user to browsing those accessible documents for him/her through a single portal, and using a new cross-lingual translation tool to help the search engine retrieve documents, the new system provides controllable information access with different authorizations, personalized services, and real-time information retrieval.
基金Supported by National Basic Research Program of China("973"Program,No.2013cb329304)Natural Science Foundation of China(No.61105072,No.61070044 and No.61111130190)International Joint Research Project"QONTEXT"of the Council of European Union
文摘The incompatible probability represents an important non-classical phenomenon, and it describes conflicting observed marginal probabilities, which cannot be satisfied with a joint probability. First, the incompatibility of random variables was defined and discussed via the non-positive semi-definiteness of their covariance matrixes. Then, a method was proposed to verify the existence of incompatible probability for variables. A hypothesis testing was also applied to reexamine the likelihood of the observed marginal probabilities being integrated into a joint probability space, thus showing the statistical significance of incompatible probability cases. A case study with user click-through data provided the initial evidence of the incompatible probability in information retrieval (IR), particularly in user interaction. The experiments indicate that both incompatible and compatible cases can be found in IR data, and informational queries are more likely to be compatible than navigational queries. The results inspire new theoretical perspectives of modeling the complex interactions and phenomena in IR.
基金Supported by the High Technology Research and Devel-opment Program of China (No.2006AA01Z150)the Key Project of the National Natural Science Foundation of China (No.60373101)+1 种基金the Natural Science Foundation of Heilongjiang Province (No.F2007-14)the Project of Heilongjiang Outstanding Young University Teacher (No. 1151G037).
文摘This letter presents a new discriminative model for Information Retrieval (IR), referred to as Ordinal Regression Model (ORM). ORM is different from most existing models in that it views IR as ordinal regression problem (i.e. ranking problem) instead of binary classification. It is noted that the task of IR is to rank documents according to the user information needed, so IR can be viewed as ordinal regression problem. Two parameter learning algorithms for ORM are presented. One is a perceptron-based algorithm. The other is the ranking Support Vector Machine (SVM). The effec- tiveness of the proposed approach has been evaluated on the task of ad hoc retrieval using three English Text REtrieval Conference (TREC) sets and two Chinese TREC sets. Results show that ORM sig- nificantly outperforms the state-of-the-art language model approaches and OKAPI system in all test sets; and it is more appropriate to view IR as ordinal regression other than binary classification.
文摘We cleveloped a high-speed information retrieval system. The system hased on the IXP 2800 is one of the dedicute device. The velocity of the information retrieval is 6.8 Gb/s. The protocol support Telnet, FTP, SMTP, POP3 etc. various networks protocols. The information retrieval supports the key word and the natural language process. This paper explains the hardware system, software system and the index of the performance. Key words network processor - IXP2800 - information retrieval - IXA CLC number TP 309 Foundation item: Supported by the National Natural Science Foundation of China (69873016 & 69972017) and the National High Technology Development Program of China (863-301-06-1)Biography: SHI Shu-dong (1963-), male, Ph. D. candidate, research direction: network & information security.
文摘Decision Support Systems(DSS)are man-machine interaction systems,which support the de-cision-makers to solve the unstructured and semi-structured decisions,this paper advances that thefunction of problem-oriented information retrieval DSS can meet the needs of enterprise’s topmanagement effectively in comparison with other information retrieval functions,in accordancewith the features of supporting information for decision.An architecture of this system is presented,which dissolves a problem put forward or recognized by the user into the problem recognized by thecomputer,forming retrieval tactics and searching the data the user needs.Designed and developedaccording to the architecture of this system,a prototype system is introduced,which is CF Econom-ic Environment Information Retrieval DSS.
文摘OOV term translation plays an important role in natural language processing. Although many researchers in the past have endeavored to solve the OOV term translation problems, but none existing methods offer definition or context information of OOV terms. Furthermore, non-existing methods focus on cross-language definition retrieval for OOV terms. Never the less, it has always been so difficult to evaluate the correctness of an OOV term translation without domain specific knowledge and correct references. Our English definition ranking method differentiate the types of OOV terms, and applies different methods for translation extraction. Our English definition ranking method also extracts multilingual context information and monolingual definitions of OOV terms. In addition, we propose a novel cross-language definition retrieval system for OOV terms. Never the less, we propose an auto re-evaluation method to evaluate the correctness of OOV translations and definitions. Our methods achieve high performances against existing methods.
文摘Information retrieval (IR) systems are designed to help information seekers retrieving relevant information from vast document. The need for relevant information from a vast amount of document gave birth to IR systems. Even though different IR systems exist, they cannot meet all users’ expectations. A different level of users’ knowledge makes queries to be expressed in different ways. As a result, the system may miss the core meaning of users query and retrieve dissatisfactory results. This happens mainly because of the ambiguities of words involved in the natural languages and expression mismatch among users and authors. The existing ambiguities in Amharic language have negative impacts on the performance of Amharic IR system. Some of the ambiguities for this type of problem are: spelling variants of the same word, polysemous and synonymous terms. If users are not fully knowledgeable about the information domain area, they will mostly formulate weak queries to retrieve documents. Thus, they end up frustrated with the results found from an IR system. This research has been conducted, aiming at augmenting the recall of previous work. Statistical co-occurrence technique has been used in order to expand query terms. The main reason for performing query expansion is to provide relevant documents as per users’ query that can satisfy their information need. Statistical co-occurrence method considers, frequently appearing terms with the query term, regardless of their position. The efficiency of proposed technique has been tested on the prototype system and the result found compared with the result of previous study. Accordingly, 6% recall and 2% f-measure improvement has been made. Hence, the statistical co-occurrence method outperformed the bi-gram based IR system.