Classification is an important technique in data mining. The decision trees built by most of the existing classification algorithms commonly feature over-branching, which will lead to poor efficiency in the subsequent...Classification is an important technique in data mining. The decision trees built by most of the existing classification algorithms commonly feature over-branching, which will lead to poor efficiency in the subsequent classification period. In this paper, we present a new value-oriented classification method, which aims at building accurately proper-sized decision trees while reducing over-branching as much as possible, based on the concepts of frequentpattern-node and exceptive-child-node. The experiments show that while using relevant analysis as pre-processing, our classification method, without loss of accuracy, can eliminate the over-branching greatly in decision trees more effectively and efficiently than other algorithms do.展开更多
XML (eXtensible Markup Language) is a standard which is widely appliedin data representation and data exchange. However, as an important concept of XML, DTD(Document Type Definition) is not taken full advantage in cur...XML (eXtensible Markup Language) is a standard which is widely appliedin data representation and data exchange. However, as an important concept of XML, DTD(Document Type Definition) is not taken full advantage in current applications. In this paper, anew method for clustering DTDs is presented, and it can be used in XML document clustering.The two-level method clusters the elements in DTDs and clusters DTDs separately. Elementclustering forms the first level and provides element clusters, which are the generalization ofrelevant elements. DTD clustering utilizes the generalized information and forms the secondlevel in the whole clustering process. The two-level method has the following advantages: 1) Ittakes into consideration both the content and the structure within DTDs; 2) The generalizedinformation about elements is more useful than the separated words in the vector model; 3) Thetwo-level method facilitates the searching of outliers. The experiments show that this methodis able to categorize the relevant DTDs effectively.展开更多
文摘Classification is an important technique in data mining. The decision trees built by most of the existing classification algorithms commonly feature over-branching, which will lead to poor efficiency in the subsequent classification period. In this paper, we present a new value-oriented classification method, which aims at building accurately proper-sized decision trees while reducing over-branching as much as possible, based on the concepts of frequentpattern-node and exceptive-child-node. The experiments show that while using relevant analysis as pre-processing, our classification method, without loss of accuracy, can eliminate the over-branching greatly in decision trees more effectively and efficiently than other algorithms do.
文摘XML (eXtensible Markup Language) is a standard which is widely appliedin data representation and data exchange. However, as an important concept of XML, DTD(Document Type Definition) is not taken full advantage in current applications. In this paper, anew method for clustering DTDs is presented, and it can be used in XML document clustering.The two-level method clusters the elements in DTDs and clusters DTDs separately. Elementclustering forms the first level and provides element clusters, which are the generalization ofrelevant elements. DTD clustering utilizes the generalized information and forms the secondlevel in the whole clustering process. The two-level method has the following advantages: 1) Ittakes into consideration both the content and the structure within DTDs; 2) The generalizedinformation about elements is more useful than the separated words in the vector model; 3) Thetwo-level method facilitates the searching of outliers. The experiments show that this methodis able to categorize the relevant DTDs effectively.