Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recogni...Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.展开更多
Spatial objects have two types of attributes: geometrical attributes and non-geometrical attributes, which belong to two different attribute domains (geometrical and non-geometrical domains). Although geometrically...Spatial objects have two types of attributes: geometrical attributes and non-geometrical attributes, which belong to two different attribute domains (geometrical and non-geometrical domains). Although geometrically scattered in a geometrical domain, spatial objects may be similar to each other in a non-geometrical domain. Most existing clustering algorithms group spatial datasets into different compact regions in a geometrical domain without considering the aspect of a non-geometrical domain. However, many application scenarios require clustering results in which a cluster has not only high proximity in a geometrical domain, but also high similarity in a non-geometrical domain. This means constraints are imposed on the clustering goal from both geometrical and non-geometrical domains simultaneously. Such a clustering problem is called dual clustering. As distributed clustering applications become more and more popular, it is necessary to tackle the dual clustering problem in distributed databases. The DCAD algorithm is proposed to solve this problem. DCAD consists of two levels of clustering: local clustering and global clustering. First, clustering is conducted at each local site with a local clustering algorithm, and the features of local clusters are extracted clustering is obtained based on those features fective and efficient. Second, local features from each site are sent to a central site where global Experiments on both artificial and real spatial datasets show that DCAD is effective and efficient.展开更多
It is a period of information explosion. Especially for spatial information science, information can be acquired through many ways, such as man made planet, aeroplane, laser, digital photogrammetry and so on. Spatial...It is a period of information explosion. Especially for spatial information science, information can be acquired through many ways, such as man made planet, aeroplane, laser, digital photogrammetry and so on. Spatial data sources are usually distributed and heterogeneous. Federated database is the best resolution for the share and interoperation of spatial database. In this paper, the concepts of federated database and interoperability are introduced. Three heterogeneous kinds of spatial data, vector, image and DEM are used to create integrated database. A data model of federated spatial databases is given.展开更多
In this paper, constrained K closest pairs query is introduced, wbich retrieves the K closest pairs satisfying the given spatial constraint from two datasets. For data sets indexed by R trees in spatial databases, thr...In this paper, constrained K closest pairs query is introduced, wbich retrieves the K closest pairs satisfying the given spatial constraint from two datasets. For data sets indexed by R trees in spatial databases, three algorithms are presented for answering this kind of query. Among of them, two-phase Range+Join and Join+Range algorithms adopt the strategy that changes the execution order of range and closest pairs queries, and constrained heap-based algorithm utilizes extended distance functions to prune search space and minimize the pruning distance. Experimental results show that constrained heap-base algorithm has better applicability and performance than two-phase algorithms.展开更多
GeoStar is the registered trademark of GIS software made by WTUSM in China.By means of the GeoStar,multi_scale images,DEMs,graphics and attributes integrated in very large seamless databases can be created,and the mul...GeoStar is the registered trademark of GIS software made by WTUSM in China.By means of the GeoStar,multi_scale images,DEMs,graphics and attributes integrated in very large seamless databases can be created,and the multi_dimensional dynamic visualization and information extraction are also available.This paper describes the fundamental characteristics of such huge integrated databases,for instance,the data models,database structures and the spatial index strategies.At last,the typical applications of GeoStar for a few pilot projects like the Shanghai CyberCity and the Guangdong provincial spatial data infrastructure (SDI) are illustrated and several concluding remarks are stressed.展开更多
Digital Orthographic Map (DOM) can be used in various applications because it contains both image features and terrain information. Spatial database management systems aim at the effective and efficient management of ...Digital Orthographic Map (DOM) can be used in various applications because it contains both image features and terrain information. Spatial database management systems aim at the effective and efficient management of data related to a space, engineering design and so on. Thereby spatial database provides an efficient solution for managing DOM. According to large amounts of the DOM data in storage, a data compression based on wavelet is introduced into the storage. Another strategy to solve this problem is to decompose the raw image into tiles and store the tiles individually as separate tuples. The metadata of DOM can be used to organize and manage spatial information, especially for spatial data sharing and fast locating. A tool for browsing, zooming and querying the DOM data is also designed. We implemented these ideas in SISP(Spatial Information Sharing System) and applied the subsystem into the DOM management of Beijing City, which is an component of the Beijing Spatial Information Infrastructure.展开更多
Recently, attention has been focused on spatial query language which is used to query spatial databases. A design of spatial query language has been presented in this paper by extending the standard relational databas...Recently, attention has been focused on spatial query language which is used to query spatial databases. A design of spatial query language has been presented in this paper by extending the standard relational database query language SQL. It recognizes the significantly different requirements of spatial data handling and overcomes the inherent problems of the application of conventional database query languages. This design is based on an extended spatial data model, including the spatial data types and the spatial operators on them. The processing and optimization of spatial queries have also been discussed in this design. In the end, an implementation of this design is given in a spatial query subsystem.展开更多
基金Supported by the Open Researches Fund Program of L IESMARS(WKL(0 0 ) 0 30 2 )
文摘Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.
基金Funded by the National 973 Program of China (No.2003CB415205)the National Natural Science Foundation of China (No.40523005, No.60573183, No.60373019)the Open Research Fund Program of LIESMARS (No.WKL(04)0303).
文摘Spatial objects have two types of attributes: geometrical attributes and non-geometrical attributes, which belong to two different attribute domains (geometrical and non-geometrical domains). Although geometrically scattered in a geometrical domain, spatial objects may be similar to each other in a non-geometrical domain. Most existing clustering algorithms group spatial datasets into different compact regions in a geometrical domain without considering the aspect of a non-geometrical domain. However, many application scenarios require clustering results in which a cluster has not only high proximity in a geometrical domain, but also high similarity in a non-geometrical domain. This means constraints are imposed on the clustering goal from both geometrical and non-geometrical domains simultaneously. Such a clustering problem is called dual clustering. As distributed clustering applications become more and more popular, it is necessary to tackle the dual clustering problem in distributed databases. The DCAD algorithm is proposed to solve this problem. DCAD consists of two levels of clustering: local clustering and global clustering. First, clustering is conducted at each local site with a local clustering algorithm, and the features of local clusters are extracted clustering is obtained based on those features fective and efficient. Second, local features from each site are sent to a central site where global Experiments on both artificial and real spatial datasets show that DCAD is effective and efficient.
基金Supported by the National Nature Science Foundation under"Outstanding Young Researchers"(495 2 5 10 1)
文摘It is a period of information explosion. Especially for spatial information science, information can be acquired through many ways, such as man made planet, aeroplane, laser, digital photogrammetry and so on. Spatial data sources are usually distributed and heterogeneous. Federated database is the best resolution for the share and interoperation of spatial database. In this paper, the concepts of federated database and interoperability are introduced. Three heterogeneous kinds of spatial data, vector, image and DEM are used to create integrated database. A data model of federated spatial databases is given.
基金Supported by National Natural Science Foundationof China (60073045)
文摘In this paper, constrained K closest pairs query is introduced, wbich retrieves the K closest pairs satisfying the given spatial constraint from two datasets. For data sets indexed by R trees in spatial databases, three algorithms are presented for answering this kind of query. Among of them, two-phase Range+Join and Join+Range algorithms adopt the strategy that changes the execution order of range and closest pairs queries, and constrained heap-based algorithm utilizes extended distance functions to prune search space and minimize the pruning distance. Experimental results show that constrained heap-base algorithm has better applicability and performance than two-phase algorithms.
文摘GeoStar is the registered trademark of GIS software made by WTUSM in China.By means of the GeoStar,multi_scale images,DEMs,graphics and attributes integrated in very large seamless databases can be created,and the multi_dimensional dynamic visualization and information extraction are also available.This paper describes the fundamental characteristics of such huge integrated databases,for instance,the data models,database structures and the spatial index strategies.At last,the typical applications of GeoStar for a few pilot projects like the Shanghai CyberCity and the Guangdong provincial spatial data infrastructure (SDI) are illustrated and several concluding remarks are stressed.
基金This work is supported by the National High Technology Research and Development Program ofChina(2 0 0 2 AA135 2 30 ) and the Major Project of National Natural Science Foundation of Beijing(4 0 110 0 2 )
文摘Digital Orthographic Map (DOM) can be used in various applications because it contains both image features and terrain information. Spatial database management systems aim at the effective and efficient management of data related to a space, engineering design and so on. Thereby spatial database provides an efficient solution for managing DOM. According to large amounts of the DOM data in storage, a data compression based on wavelet is introduced into the storage. Another strategy to solve this problem is to decompose the raw image into tiles and store the tiles individually as separate tuples. The metadata of DOM can be used to organize and manage spatial information, especially for spatial data sharing and fast locating. A tool for browsing, zooming and querying the DOM data is also designed. We implemented these ideas in SISP(Spatial Information Sharing System) and applied the subsystem into the DOM management of Beijing City, which is an component of the Beijing Spatial Information Infrastructure.
基金This work is supported by the National High Technology Research and Development Program ofChina(2 0 0 2 AA135 2 30 ) and the Major Project of National Natural Science Foundation of Beijing(4 0 110 0 2 ) .
文摘Recently, attention has been focused on spatial query language which is used to query spatial databases. A design of spatial query language has been presented in this paper by extending the standard relational database query language SQL. It recognizes the significantly different requirements of spatial data handling and overcomes the inherent problems of the application of conventional database query languages. This design is based on an extended spatial data model, including the spatial data types and the spatial operators on them. The processing and optimization of spatial queries have also been discussed in this design. In the end, an implementation of this design is given in a spatial query subsystem.