Earth observations and model simulations are generating big multidimensional array-based raster data.However,it is difficult to efficiently query these big raster data due to the inconsistency among the geospatial ras...Earth observations and model simulations are generating big multidimensional array-based raster data.However,it is difficult to efficiently query these big raster data due to the inconsistency among the geospatial raster data model,distributed physical data storage model,and the data pipeline in distributed computing frameworks.To efficiently process big geospatial data,this paper proposes a three-layer hierarchical indexing strategy to optimize Apache Spark with Hadoop Distributed File System(HDFS)from the following aspects:(1)improve I/O efficiency by adopting the chunking data structure;(2)keep the workload balance and high data locality by building the global index(k-d tree);(3)enable Spark and HDFS to natively support geospatial raster data formats(e.g.,HDF4,NetCDF4,GeoTiff)by building the local index(hash table);(4)index the in-memory data to further improve geospatial data queries;(5)develop a data repartition strategy to tune the query parallelism while keeping high data locality.The above strategies are implemented by developing the customized RDDs,and evaluated by comparing the performance with that of Spark SQL and SciSpark.The proposed indexing strategy can be applied to other distributed frameworks or cloud-based computing systems to natively support big geospatial data query with high efficiency.展开更多
In order to compare the aviation network of mid-south,northwest and southwest of China to reveal the structure similarity and difference for providing quantitative evidence to construct regional aviation network and i...In order to compare the aviation network of mid-south,northwest and southwest of China to reveal the structure similarity and difference for providing quantitative evidence to construct regional aviation network and improve its structure,hierarchical index model of regional aviation network was established through dividing the aviation network into layers to research its structure characters.Data matrixes were defined to record the basic state of regional aviation network.Index matrixes were constructed to describe the quantitative features of regional aviation network.On the basis of these indexes,several structure indexes of all layers of aviation network were calculated to show the structure features of aviation network,such as ratio of passenger volume within the region with across the region,share rate of passenger volume among layers,ratio of average number of airline for each airport,ratio of average passenger volume for each airline and ratio of airline rate.According to the statistical data,similar structure of share rate of passenger volume among layers and average passenger volume for each airline in their regional aviation network was found after calculating.But on the side of ratio of passenger volume within the region with across the region,ratio of average number of airlines for each airport and ratio of airline rate were different.展开更多
Although the distance between binary codes can be computed fast in Hamming space, linear search is not practical for large scale datasets. Therefore attention has been paid to the efficiency of performing approximate ...Although the distance between binary codes can be computed fast in Hamming space, linear search is not practical for large scale datasets. Therefore attention has been paid to the efficiency of performing approximate nearest neighbor search, in which hierarchical clustering trees (HCT) are widely used. However, HCT select cluster centers randomly and build indexes with the entire binary code, this degrades search performance. In this paper, we first propose a new clustering algorithm, which chooses cluster centers on the basis of relative distances and uses a more homogeneous partition of the dataset than HCT has to build the hierarchical clustering trees. Then, we present an algorithm to compress binary codes by extracting distinctive bits according to the standard deviation of each bit. Consequently, a new index is proposed using compressed binary codes based on hierarchical decomposition of binary spaces. Experiments conducted on reference datasets and a dataset of one billion binary codes demonstrate the effectiveness and efficiency of our method.展开更多
The courses based on computer system ability training has its particularity of training objective.To achieve objective feedback of teaching,a multi-subject and multi-dimensional hierarchical teaching evaluation system...The courses based on computer system ability training has its particularity of training objective.To achieve objective feedback of teaching,a multi-subject and multi-dimensional hierarchical teaching evaluation system is proposed based on the analysis of computer system ability training objectives.Then,the teaching content,implementation,learning experience and teaching effect were evaluated by supervisors,colleague teachers and students.Finally,the multi-dimensional evaluation results show that the evaluation system proposed is effective and provides a reference for teaching evaluation.展开更多
基金This research is funded by NASA(National Aeronautics and Space Administration)NCCS and AIST(NNX15AM85G)NSF I/UCRC,CSSI,and EarthCube Programs(1338925 and 1835507).
文摘Earth observations and model simulations are generating big multidimensional array-based raster data.However,it is difficult to efficiently query these big raster data due to the inconsistency among the geospatial raster data model,distributed physical data storage model,and the data pipeline in distributed computing frameworks.To efficiently process big geospatial data,this paper proposes a three-layer hierarchical indexing strategy to optimize Apache Spark with Hadoop Distributed File System(HDFS)from the following aspects:(1)improve I/O efficiency by adopting the chunking data structure;(2)keep the workload balance and high data locality by building the global index(k-d tree);(3)enable Spark and HDFS to natively support geospatial raster data formats(e.g.,HDF4,NetCDF4,GeoTiff)by building the local index(hash table);(4)index the in-memory data to further improve geospatial data queries;(5)develop a data repartition strategy to tune the query parallelism while keeping high data locality.The above strategies are implemented by developing the customized RDDs,and evaluated by comparing the performance with that of Spark SQL and SciSpark.The proposed indexing strategy can be applied to other distributed frameworks or cloud-based computing systems to natively support big geospatial data query with high efficiency.
文摘In order to compare the aviation network of mid-south,northwest and southwest of China to reveal the structure similarity and difference for providing quantitative evidence to construct regional aviation network and improve its structure,hierarchical index model of regional aviation network was established through dividing the aviation network into layers to research its structure characters.Data matrixes were defined to record the basic state of regional aviation network.Index matrixes were constructed to describe the quantitative features of regional aviation network.On the basis of these indexes,several structure indexes of all layers of aviation network were calculated to show the structure features of aviation network,such as ratio of passenger volume within the region with across the region,share rate of passenger volume among layers,ratio of average number of airline for each airport,ratio of average passenger volume for each airline and ratio of airline rate.According to the statistical data,similar structure of share rate of passenger volume among layers and average passenger volume for each airline in their regional aviation network was found after calculating.But on the side of ratio of passenger volume within the region with across the region,ratio of average number of airlines for each airport and ratio of airline rate were different.
文摘Although the distance between binary codes can be computed fast in Hamming space, linear search is not practical for large scale datasets. Therefore attention has been paid to the efficiency of performing approximate nearest neighbor search, in which hierarchical clustering trees (HCT) are widely used. However, HCT select cluster centers randomly and build indexes with the entire binary code, this degrades search performance. In this paper, we first propose a new clustering algorithm, which chooses cluster centers on the basis of relative distances and uses a more homogeneous partition of the dataset than HCT has to build the hierarchical clustering trees. Then, we present an algorithm to compress binary codes by extracting distinctive bits according to the standard deviation of each bit. Consequently, a new index is proposed using compressed binary codes based on hierarchical decomposition of binary spaces. Experiments conducted on reference datasets and a dataset of one billion binary codes demonstrate the effectiveness and efficiency of our method.
文摘The courses based on computer system ability training has its particularity of training objective.To achieve objective feedback of teaching,a multi-subject and multi-dimensional hierarchical teaching evaluation system is proposed based on the analysis of computer system ability training objectives.Then,the teaching content,implementation,learning experience and teaching effect were evaluated by supervisors,colleague teachers and students.Finally,the multi-dimensional evaluation results show that the evaluation system proposed is effective and provides a reference for teaching evaluation.