Traditional information retrieval systems respond to user queries with ranked lists of relevant documents. Since, XML (Extensible Markup Language) documents separate content and structure; XML-IR (information retri...Traditional information retrieval systems respond to user queries with ranked lists of relevant documents. Since, XML (Extensible Markup Language) documents separate content and structure; XML-IR (information retrieval) systems are able to retrieve only the relevant portions of documents. Therefore, users who utilize an XML-IR system could potentially receive highly relevant and precise material. We have developed the XML information retrieval system by using MySQL and Sphinx, which we call MEXIR. In our system, XML documents are stored into one table that has fixed relational schema. The schema is independent of the logical structure of XML documents. Each node in XML documents is represented by labels that express the positions in XML tree, namely ADXPI scheme. Our system has performance experiments on INEX collections and shown an average up to four seconds better than GPX. In addition, it has been reduced the size of the data down by 82.29 % compare to GPX system.展开更多
We propose a new approach to store and query XML data in an RDBMS basing on the idea of the numbering scheme and inverted list. O ur approach allows us to quickly determine the precedence, sibling and ancestor/ desc...We propose a new approach to store and query XML data in an RDBMS basing on the idea of the numbering scheme and inverted list. O ur approach allows us to quickly determine the precedence, sibling and ancestor/ descendant relationships between any pair of nodes in the hierarchy of XML, and utilize path index to speed up calculating of path expressions. Examples have de monstrated that our approach can effectively and efficiently support both XQuery queries and keyword searches. Our approach is also flexible enough to support X ML documents both with Schema and without Schema, and applications both retrieva l and update. We also present the architecture of middleware for application acc essing XML documents stored in relations, and an algorithm translating a given X ML document into relations effectively.展开更多
文摘Traditional information retrieval systems respond to user queries with ranked lists of relevant documents. Since, XML (Extensible Markup Language) documents separate content and structure; XML-IR (information retrieval) systems are able to retrieve only the relevant portions of documents. Therefore, users who utilize an XML-IR system could potentially receive highly relevant and precise material. We have developed the XML information retrieval system by using MySQL and Sphinx, which we call MEXIR. In our system, XML documents are stored into one table that has fixed relational schema. The schema is independent of the logical structure of XML documents. Each node in XML documents is represented by labels that express the positions in XML tree, namely ADXPI scheme. Our system has performance experiments on INEX collections and shown an average up to four seconds better than GPX. In addition, it has been reduced the size of the data down by 82.29 % compare to GPX system.
文摘We propose a new approach to store and query XML data in an RDBMS basing on the idea of the numbering scheme and inverted list. O ur approach allows us to quickly determine the precedence, sibling and ancestor/ descendant relationships between any pair of nodes in the hierarchy of XML, and utilize path index to speed up calculating of path expressions. Examples have de monstrated that our approach can effectively and efficiently support both XQuery queries and keyword searches. Our approach is also flexible enough to support X ML documents both with Schema and without Schema, and applications both retrieva l and update. We also present the architecture of middleware for application acc essing XML documents stored in relations, and an algorithm translating a given X ML document into relations effectively.