In this paper,we propose a novel spatial data index based on Hadoop:HQ-Tree.In HQ-Tree,we use PR QuadTrec to solve the problem of poor efficiency in parallel processing,which is caused by data insertion order and spac...In this paper,we propose a novel spatial data index based on Hadoop:HQ-Tree.In HQ-Tree,we use PR QuadTrec to solve the problem of poor efficiency in parallel processing,which is caused by data insertion order and space overlapping.For the problem that HDFS cannot support random write,we propose an updating mechanism,called "Copy Write",to support the index update.Additionally,HQ-Tree employs a two-level index caching mechanism to reduce the cost of network transferring and I/O operations.Finally,we develop MapReduce-based algorithms,which are able to significantly enhance the efficiency of index creation and query.Experimental results demonstrate the effectiveness of our methods.展开更多
Spatial objects have two types of attributes: geometrical attributes and non-geometrical attributes, which belong to two different attribute domains (geometrical and non-geometrical domains). Although geometrically...Spatial objects have two types of attributes: geometrical attributes and non-geometrical attributes, which belong to two different attribute domains (geometrical and non-geometrical domains). Although geometrically scattered in a geometrical domain, spatial objects may be similar to each other in a non-geometrical domain. Most existing clustering algorithms group spatial datasets into different compact regions in a geometrical domain without considering the aspect of a non-geometrical domain. However, many application scenarios require clustering results in which a cluster has not only high proximity in a geometrical domain, but also high similarity in a non-geometrical domain. This means constraints are imposed on the clustering goal from both geometrical and non-geometrical domains simultaneously. Such a clustering problem is called dual clustering. As distributed clustering applications become more and more popular, it is necessary to tackle the dual clustering problem in distributed databases. The DCAD algorithm is proposed to solve this problem. DCAD consists of two levels of clustering: local clustering and global clustering. First, clustering is conducted at each local site with a local clustering algorithm, and the features of local clusters are extracted clustering is obtained based on those features fective and efficient. Second, local features from each site are sent to a central site where global Experiments on both artificial and real spatial datasets show that DCAD is effective and efficient.展开更多
基金This work is supported by the National Natural Science Foundation of China under Grant No.61370091and No.61170200, Jiangsu Province Science and Technology Support Program (industry) Project under Grant No.BE2012179, Program Sponsored for Scientific Innovation Research of College Graduate in Jiangsu Province under Grant No. CXZZ12_0229.
文摘In this paper,we propose a novel spatial data index based on Hadoop:HQ-Tree.In HQ-Tree,we use PR QuadTrec to solve the problem of poor efficiency in parallel processing,which is caused by data insertion order and space overlapping.For the problem that HDFS cannot support random write,we propose an updating mechanism,called "Copy Write",to support the index update.Additionally,HQ-Tree employs a two-level index caching mechanism to reduce the cost of network transferring and I/O operations.Finally,we develop MapReduce-based algorithms,which are able to significantly enhance the efficiency of index creation and query.Experimental results demonstrate the effectiveness of our methods.
基金Funded by the National 973 Program of China (No.2003CB415205)the National Natural Science Foundation of China (No.40523005, No.60573183, No.60373019)the Open Research Fund Program of LIESMARS (No.WKL(04)0303).
文摘Spatial objects have two types of attributes: geometrical attributes and non-geometrical attributes, which belong to two different attribute domains (geometrical and non-geometrical domains). Although geometrically scattered in a geometrical domain, spatial objects may be similar to each other in a non-geometrical domain. Most existing clustering algorithms group spatial datasets into different compact regions in a geometrical domain without considering the aspect of a non-geometrical domain. However, many application scenarios require clustering results in which a cluster has not only high proximity in a geometrical domain, but also high similarity in a non-geometrical domain. This means constraints are imposed on the clustering goal from both geometrical and non-geometrical domains simultaneously. Such a clustering problem is called dual clustering. As distributed clustering applications become more and more popular, it is necessary to tackle the dual clustering problem in distributed databases. The DCAD algorithm is proposed to solve this problem. DCAD consists of two levels of clustering: local clustering and global clustering. First, clustering is conducted at each local site with a local clustering algorithm, and the features of local clusters are extracted clustering is obtained based on those features fective and efficient. Second, local features from each site are sent to a central site where global Experiments on both artificial and real spatial datasets show that DCAD is effective and efficient.