A new way of indexing and processing twig patterns in an XML documents is proposed in this paper. Every path in XML document can be transformed into a sequence of labels by Structure-Encoded that constructs a one-to-o...A new way of indexing and processing twig patterns in an XML documents is proposed in this paper. Every path in XML document can be transformed into a sequence of labels by Structure-Encoded that constructs a one-to-one correspondence between XML tree and sequence. Base on identifying characteristics of nodes in XML tree, the elements are classified and clustered. During query proceeding, the twig pattern is also transformed into its Structure-Encoded. By performing subsequence matching on the set of sequences in XML documents, all the occurrences of path in the XML documents are refined. Using the index, the numbers of elements retrieved are minimized. The search results with pertinent format provide more structure information without any false dismissals or false alarms. The index also supports keyword search Experiment results indicate the index has significantly efficiency with high precision.展开更多
Time series clustering is a challenging problem due to the large-volume,high-dimensional,and warping characteristics of time series data.Traditional clustering methods often use a single criterion or distance measure,...Time series clustering is a challenging problem due to the large-volume,high-dimensional,and warping characteristics of time series data.Traditional clustering methods often use a single criterion or distance measure,which may not capture all the features of the data.This paper proposes a novel method for time series clustering based on evolutionary multi-tasking optimization,termed i-MFEA,which uses an improved multifactorial evolutionary algorithm to optimize multiple clustering tasks simultaneously,each with a different validity index or distance measure.Therefore,i-MFEA can produce diverse and robust clustering solutions that satisfy various preferences of decision-makers.Experiments on two artificial datasets show that i-MFEA outperforms single-objective evolutionary algorithms and traditional clustering methods in terms of convergence speed and clustering quality.The paper also discusses how i-MFEA can address two long-standing issues in time series clustering:the choice of appropriate similarity measure and the number of clusters.展开更多
The pick-up algorithm by the k-th order cluster for the closest distance is used in the fields of weather and climactic events, and the technical terms clustered index and high clustered region are defined to investig...The pick-up algorithm by the k-th order cluster for the closest distance is used in the fields of weather and climactic events, and the technical terms clustered index and high clustered region are defined to investigate their temporal and spatial distribution characteristics in China during the past 50 years. The results show that the contribution of extreme high-temperature event clusters changed in the period from the 1960s to the 1970s, and its strength was enhanced. On the other hand, the decreasing trend in the clusters of low-temperature extremes can be taken as a signal for warmer winters to follow in the decadal time scale. Torrential rain and heavy rainfall clusters have both been lessened in the past 50 years, and have different cluster characteristics because of their definitions. Regions with high clustered indexes are concentrated in southern China. The spatial evolution of the heavy rainfall clusters reveals that clustered heavy rainfall has played an important role in the rain-belt pattern over China during the last 50 years.展开更多
A multilevel secure relation hierarchical data model for multilevel secure database is extended from the relation hierarchical data model in single level environment in this paper. Based on the model, an upper lowe...A multilevel secure relation hierarchical data model for multilevel secure database is extended from the relation hierarchical data model in single level environment in this paper. Based on the model, an upper lower layer relationalintegrity is presented after we analyze and eliminate the covert channels caused by the database integrity.Two SQL statements are extended to process polyinstantiation in the multilevel secure environment.The system based on the multilevel secure relation hierarchical data model is capable of integratively storing and manipulating complicated objects ( e.g. , multilevel spatial data) and conventional data ( e.g. , integer, real number and character string) in multilevel secure database.展开更多
Although the distance between binary codes can be computed fast in Hamming space, linear search is not practical for large scale datasets. Therefore attention has been paid to the efficiency of performing approximate ...Although the distance between binary codes can be computed fast in Hamming space, linear search is not practical for large scale datasets. Therefore attention has been paid to the efficiency of performing approximate nearest neighbor search, in which hierarchical clustering trees (HCT) are widely used. However, HCT select cluster centers randomly and build indexes with the entire binary code, this degrades search performance. In this paper, we first propose a new clustering algorithm, which chooses cluster centers on the basis of relative distances and uses a more homogeneous partition of the dataset than HCT has to build the hierarchical clustering trees. Then, we present an algorithm to compress binary codes by extracting distinctive bits according to the standard deviation of each bit. Consequently, a new index is proposed using compressed binary codes based on hierarchical decomposition of binary spaces. Experiments conducted on reference datasets and a dataset of one billion binary codes demonstrate the effectiveness and efficiency of our method.展开更多
This paper tests a data mining method for evaluation of the "IRTA"(Index of Relative Tectonic Activity) to investigate the impact of active tectonics on geomorphic processes and landscape development. Based upon K...This paper tests a data mining method for evaluation of the "IRTA"(Index of Relative Tectonic Activity) to investigate the impact of active tectonics on geomorphic processes and landscape development. Based upon K-means clustering of six basin-related geomorphic indices(the hypsometric integral, basin asymmetric factor, drainage density, basin shape ratio, mean axial slope of the channel and topographic roughness) that represent the relative strength of active tectonic deformation on topography and morphology, the relative tectonic activity along the Kazerun Fault Zone in the Zagros Mountains of Iran may be classified into low, moderate and high relative tectonic activity zones. The results allow the identification of the clusters of similarly deformed areas related to relative tectonic activity. The utilization of geomorphic parameters as well as IRTA with comparison to the field observations exhibit change in relative tectonic activities mostly corresponding to the change in mechanism of the prominent fault zones in the study area.展开更多
The density based notion for clustering approach is used widely due to its easy implementation and ability to detect arbitrary shaped clusters in the presence of noisy data points without requiring prior knowledge of ...The density based notion for clustering approach is used widely due to its easy implementation and ability to detect arbitrary shaped clusters in the presence of noisy data points without requiring prior knowledge of the number of clusters to be identified. Density-based spatial clustering of applications with noise (DBSCAN) is the first algorithm proposed in the literature that uses density based notion for cluster detection. Since most of the real data set, today contains feature space of adjacent nested clusters, clearly DBSCAN is not suitable to detect variable adjacent density clusters due to the use of global density parameter neighborhood radius Y,.ad and minimum number of points in neighborhood Np~,. So the efficiency of DBSCAN depends on these initial parameter settings, for DBSCAN to work properly, the neighborhood radius must be less than the distance between two clusters otherwise algorithm merges two clusters and detects them as a single cluster. Through this paper: 1) We have proposed improved version of DBSCAN algorithm to detect clusters of varying density adjacent clusters by using the concept of neighborhood difference and using the notion of density based approach without introducing much additional computational complexity to original DBSCAN algorithm. 2) We validated our experimental results using one of our authors recently proposed space density indexing (SDI) internal cluster measure to demonstrate the quality of proposed clustering method. Also our experimental results suggested that proposed method is effective in detecting variable density adjacent nested clusters.展开更多
基金Supported by the National Natural Science Foundation of China (60473085)
文摘A new way of indexing and processing twig patterns in an XML documents is proposed in this paper. Every path in XML document can be transformed into a sequence of labels by Structure-Encoded that constructs a one-to-one correspondence between XML tree and sequence. Base on identifying characteristics of nodes in XML tree, the elements are classified and clustered. During query proceeding, the twig pattern is also transformed into its Structure-Encoded. By performing subsequence matching on the set of sequences in XML documents, all the occurrences of path in the XML documents are refined. Using the index, the numbers of elements retrieved are minimized. The search results with pertinent format provide more structure information without any false dismissals or false alarms. The index also supports keyword search Experiment results indicate the index has significantly efficiency with high precision.
基金supported by the Open Project of Xiangjiang Laboratory(No.22XJ02003)the National Natural Science Foundation of China(No.62122093).
文摘Time series clustering is a challenging problem due to the large-volume,high-dimensional,and warping characteristics of time series data.Traditional clustering methods often use a single criterion or distance measure,which may not capture all the features of the data.This paper proposes a novel method for time series clustering based on evolutionary multi-tasking optimization,termed i-MFEA,which uses an improved multifactorial evolutionary algorithm to optimize multiple clustering tasks simultaneously,each with a different validity index or distance measure.Therefore,i-MFEA can produce diverse and robust clustering solutions that satisfy various preferences of decision-makers.Experiments on two artificial datasets show that i-MFEA outperforms single-objective evolutionary algorithms and traditional clustering methods in terms of convergence speed and clustering quality.The paper also discusses how i-MFEA can address two long-standing issues in time series clustering:the choice of appropriate similarity measure and the number of clusters.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.41005043 and 41105033)the National Basic Research Program of China(Grant No.2012CB955901)the National Science and Technology Ministry,China(Grant Nos.2007BAC29B01 and 2007BAC03A01)
文摘The pick-up algorithm by the k-th order cluster for the closest distance is used in the fields of weather and climactic events, and the technical terms clustered index and high clustered region are defined to investigate their temporal and spatial distribution characteristics in China during the past 50 years. The results show that the contribution of extreme high-temperature event clusters changed in the period from the 1960s to the 1970s, and its strength was enhanced. On the other hand, the decreasing trend in the clusters of low-temperature extremes can be taken as a signal for warmer winters to follow in the decadal time scale. Torrential rain and heavy rainfall clusters have both been lessened in the past 50 years, and have different cluster characteristics because of their definitions. Regions with high clustered indexes are concentrated in southern China. The spatial evolution of the heavy rainfall clusters reveals that clustered heavy rainfall has played an important role in the rain-belt pattern over China during the last 50 years.
文摘A multilevel secure relation hierarchical data model for multilevel secure database is extended from the relation hierarchical data model in single level environment in this paper. Based on the model, an upper lower layer relationalintegrity is presented after we analyze and eliminate the covert channels caused by the database integrity.Two SQL statements are extended to process polyinstantiation in the multilevel secure environment.The system based on the multilevel secure relation hierarchical data model is capable of integratively storing and manipulating complicated objects ( e.g. , multilevel spatial data) and conventional data ( e.g. , integer, real number and character string) in multilevel secure database.
文摘Although the distance between binary codes can be computed fast in Hamming space, linear search is not practical for large scale datasets. Therefore attention has been paid to the efficiency of performing approximate nearest neighbor search, in which hierarchical clustering trees (HCT) are widely used. However, HCT select cluster centers randomly and build indexes with the entire binary code, this degrades search performance. In this paper, we first propose a new clustering algorithm, which chooses cluster centers on the basis of relative distances and uses a more homogeneous partition of the dataset than HCT has to build the hierarchical clustering trees. Then, we present an algorithm to compress binary codes by extracting distinctive bits according to the standard deviation of each bit. Consequently, a new index is proposed using compressed binary codes based on hierarchical decomposition of binary spaces. Experiments conducted on reference datasets and a dataset of one billion binary codes demonstrate the effectiveness and efficiency of our method.
基金the Research Council of Shiraz University which has supported the project
文摘This paper tests a data mining method for evaluation of the "IRTA"(Index of Relative Tectonic Activity) to investigate the impact of active tectonics on geomorphic processes and landscape development. Based upon K-means clustering of six basin-related geomorphic indices(the hypsometric integral, basin asymmetric factor, drainage density, basin shape ratio, mean axial slope of the channel and topographic roughness) that represent the relative strength of active tectonic deformation on topography and morphology, the relative tectonic activity along the Kazerun Fault Zone in the Zagros Mountains of Iran may be classified into low, moderate and high relative tectonic activity zones. The results allow the identification of the clusters of similarly deformed areas related to relative tectonic activity. The utilization of geomorphic parameters as well as IRTA with comparison to the field observations exhibit change in relative tectonic activities mostly corresponding to the change in mechanism of the prominent fault zones in the study area.
文摘The density based notion for clustering approach is used widely due to its easy implementation and ability to detect arbitrary shaped clusters in the presence of noisy data points without requiring prior knowledge of the number of clusters to be identified. Density-based spatial clustering of applications with noise (DBSCAN) is the first algorithm proposed in the literature that uses density based notion for cluster detection. Since most of the real data set, today contains feature space of adjacent nested clusters, clearly DBSCAN is not suitable to detect variable adjacent density clusters due to the use of global density parameter neighborhood radius Y,.ad and minimum number of points in neighborhood Np~,. So the efficiency of DBSCAN depends on these initial parameter settings, for DBSCAN to work properly, the neighborhood radius must be less than the distance between two clusters otherwise algorithm merges two clusters and detects them as a single cluster. Through this paper: 1) We have proposed improved version of DBSCAN algorithm to detect clusters of varying density adjacent clusters by using the concept of neighborhood difference and using the notion of density based approach without introducing much additional computational complexity to original DBSCAN algorithm. 2) We validated our experimental results using one of our authors recently proposed space density indexing (SDI) internal cluster measure to demonstrate the quality of proposed clustering method. Also our experimental results suggested that proposed method is effective in detecting variable density adjacent nested clusters.