A new way of indexing and processing twig patterns in an XML documents is proposed in this paper. Every path in XML document can be transformed into a sequence of labels by Structure-Encoded that constructs a one-to-o...A new way of indexing and processing twig patterns in an XML documents is proposed in this paper. Every path in XML document can be transformed into a sequence of labels by Structure-Encoded that constructs a one-to-one correspondence between XML tree and sequence. Base on identifying characteristics of nodes in XML tree, the elements are classified and clustered. During query proceeding, the twig pattern is also transformed into its Structure-Encoded. By performing subsequence matching on the set of sequences in XML documents, all the occurrences of path in the XML documents are refined. Using the index, the numbers of elements retrieved are minimized. The search results with pertinent format provide more structure information without any false dismissals or false alarms. The index also supports keyword search Experiment results indicate the index has significantly efficiency with high precision.展开更多
基金Supported by the National Natural Science Foundation of China (60473085)
文摘A new way of indexing and processing twig patterns in an XML documents is proposed in this paper. Every path in XML document can be transformed into a sequence of labels by Structure-Encoded that constructs a one-to-one correspondence between XML tree and sequence. Base on identifying characteristics of nodes in XML tree, the elements are classified and clustered. During query proceeding, the twig pattern is also transformed into its Structure-Encoded. By performing subsequence matching on the set of sequences in XML documents, all the occurrences of path in the XML documents are refined. Using the index, the numbers of elements retrieved are minimized. The search results with pertinent format provide more structure information without any false dismissals or false alarms. The index also supports keyword search Experiment results indicate the index has significantly efficiency with high precision.