The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. V...The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. Verified by radiosonde, including GPS/MET observations into the analysis makes an overall improvement to the analysis variables of temperature, winds, and water vapor. However, the variational model with the ray-tracing method is quite expensive for numerical weather prediction and climate research. For example, about 4 000 GPS/MET refraction angles need to be assimilated to produce an ideal global analysis. Just one iteration of minimization will take more than 24 hours CPU time on the NCEP's Cray C90 computer. Although efforts have been taken to reduce the computational cost, it is still prohibitive for operational data assimilation. In this paper, a parallel version of the three-dimensional variational data assimilation model of GPS/MET occultation measurement suitable for massive parallel processors architectures is developed. The divide-and-conquer strategy is used to achieve parallelism and is implemented by message passing. The authors present the principles for the code's design and examine the performance on the state-of-the-art parallel computers in China. The results show that this parallel model scales favorably as the number of processors is increased. With the Memory-IO technique implemented by the author, the wall clock time per iteration used for assimilating 1420 refraction angles is reduced from 45 s to 12 s using 1420 processors. This suggests that the new parallelized code has the potential to be useful in numerical weather prediction (NWP) and climate studies.展开更多
A novel Hilbert-curve is introduced for parallel spatial data partitioning, with consideration of the huge-amount property of spatial information and the variable-length characteristic of vector data items. Based on t...A novel Hilbert-curve is introduced for parallel spatial data partitioning, with consideration of the huge-amount property of spatial information and the variable-length characteristic of vector data items. Based on the improved Hilbert curve, the algorithm can be designed to achieve almost-uniform spatial data partitioning among multiple disks in parallel spatial databases. Thus, the phenomenon of data imbalance can be significantly avoided and search and query efficiency can be enhanced.展开更多
The density inversion of gravity gradiometry data has attracted considerable attention;however,in large datasets,the multiplicity and low depth resolution as well as efficiency are constrained by time and computer mem...The density inversion of gravity gradiometry data has attracted considerable attention;however,in large datasets,the multiplicity and low depth resolution as well as efficiency are constrained by time and computer memory requirements.To solve these problems,we improve the reweighting focusing inversion and probability tomography inversion with joint multiple tensors and prior information constraints,and assess the inversion results,computing efficiency,and dataset size.A Message Passing Interface(MPI)-Open Multi-Processing(OpenMP)-Computed Unified Device Architecture(CUDA)multilevel hybrid parallel inversion,named Hybrinv for short,is proposed.Using model and real data from the Vinton Dome,we confirm that Hybrinv can be used to compute the density distribution.For data size of 100×100×20,the hybrid parallel algorithm is fast and based on the run time and scalability we infer that it can be used to process the large-scale data.展开更多
The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic part...The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic parts to obtain more variational information.A model generated from a topographic surface database is more appropriate to represent gradiometric effects derived from near-surface mass,as other kinds of data can hardly reach the spatial resolution requirement.The rectangle prism method,namely an analytic integration of Newtonian potential integrals,is a reliable and commonly used approach to modeling gravity gradient,whereas its computing efficiency is extremely low.A modified rectangle prism method and a graphical processing unit(GPU)parallel algorithm were proposed to speed up the modeling process.The modified method avoided massive redundant computations by deforming formulas according to the symmetries of prisms’integral regions,and the proposed algorithm parallelized this method’s computing process.The parallel algorithm was compared with a conventional serial algorithm using 100 elevation data in two topographic areas(rough and moderate terrain).Modeling differences between the two algorithms were less than 0.1 E,which is attributed to precision differences between single-precision and double-precision float numbers.The parallel algorithm showed computational efficiency approximately 200 times higher than the serial algorithm in experiments,demonstrating its effective speeding up in the modeling process.Further analysis indicates that both the modified method and computational parallelism through GPU contributed to the proposed algorithm’s performances in experiments.展开更多
This paper focuses on the parallel aggregation processing of data streams based on the shared-nothing architecture. A novel granularity-aware parallel aggregating model is proposed. It employs parallel sampling and li...This paper focuses on the parallel aggregation processing of data streams based on the shared-nothing architecture. A novel granularity-aware parallel aggregating model is proposed. It employs parallel sampling and linear regression to describe the characteristics of the data quantity in the query window in order to determine the partition granularity of tuples, and utilizes equal depth histogram to implement partitio ning. This method can avoid data skew and reduce communi cation cost. The experiment results on both synthetic data and actual data prove that the proposed method is efficient, practical and suitable for time-varying data streams processing.展开更多
In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose...In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.展开更多
In this paper, the high-level knowledge of financial data modeled by ordinary differential equations (ODEs) is discovered in dynamic data by using an asynchronous parallel evolutionary modeling algorithm (APHEMA). A n...In this paper, the high-level knowledge of financial data modeled by ordinary differential equations (ODEs) is discovered in dynamic data by using an asynchronous parallel evolutionary modeling algorithm (APHEMA). A numerical example of Nasdaq index analysis is used to demonstrate the potential of APHEMA. The results show that the dynamic models automatically discovered in dynamic data by computer can be used to predict the financial trends.展开更多
We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel que...We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel query optimization, transaction processing system and parallel access method in detail.展开更多
Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes...Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes the application of GPU parallel processing technology to the focusing inversion method, aiming at improving the inversion accuracy while speeding up calculation and reducing the memory consumption, thus obtaining the fast and reliable inversion results for large complex model. In this paper, equivalent storage of geometric trellis is used to calculate the sensitivity matrix, and the inversion is based on GPU parallel computing technology. The parallel computing program that is optimized by reducing data transfer, access restrictions and instruction restrictions as well as latency hiding greatly reduces the memory usage, speeds up the calculation, and makes the fast inversion of large models possible. By comparing and analyzing the computing speed of traditional single thread CPU method and CUDA-based GPU parallel technology, the excellent acceleration performance of GPU parallel computing is verified, which provides ideas for practical application of some theoretical inversion methods restricted by computing speed and computer memory. The model test verifies that the focusing inversion method can overcome the problem of severe skin effect and ambiguity of geological body boundary. Moreover, the increase of the model cells and inversion data can more clearly depict the boundary position of the abnormal body and delineate its specific shape.展开更多
. This paper conducts the analysis on the data mining algorithm implementation and its application in parallel cloud system based on C++. With the increase in the number of the cloud computing platform developers, w.... This paper conducts the analysis on the data mining algorithm implementation and its application in parallel cloud system based on C++. With the increase in the number of the cloud computing platform developers, with the use of cloud computing platform to support the growth of the number of Internet users, the system is also the proportion of log data growth. At present applies in the colony environment many is the news transmission model. In takes in the rest transmission model, between each concurrent execution part exchanges the information, and the coordinated step and the control execution through the transmission news. As for the C++ in the data mining applications, it should ? rstly hold the following features. Parallel communication and serial communication are two basic ways of general communication. Under this basis, this paper proposes the novel perspective on the data mining algorithm implementation and its application in parallel cloud system based on C++. The later research will be focused on the code based implementation.展开更多
Since Multimode data is composed of many modes and their complex relationships,it cannot be retrieved or mined effectively by utilizing traditional analysis and processing techniques for single mode data.To address th...Since Multimode data is composed of many modes and their complex relationships,it cannot be retrieved or mined effectively by utilizing traditional analysis and processing techniques for single mode data.To address the challenges,we design and implement a graph-based storage and parallel loading system aimed at multimode medical image data.The system is a framework designed to flexibly store and rapidly load these multimode data.Specifically,the system utilizes the Mode Network to model the modes and their relationships in multimode medical image data and the graph database to store the data with a parallel loading technique.展开更多
An accurate numerical algorithm for three-line fault involving different phases from each of two-parallel lines is presented. It is based on one-terminal voltage and current data. The loop and nodel equations comparin...An accurate numerical algorithm for three-line fault involving different phases from each of two-parallel lines is presented. It is based on one-terminal voltage and current data. The loop and nodel equations comparing faulted phase to non-faulted phase of two-parallel lines are introduced in the fault location estimation modal, in which the faulted impedance of remote end is not involved. The effect of load flow and fault resistance on the accuracy of fault location are effectively eliminated, therefore an accurate algorithm of locating fault is derived. The algorithm is demonstrated by digital computer simulations and the results show that errors in locating fault are less than 1%.展开更多
A new era of data access and management has begun with the use of cloud computing in the healthcare industry.Despite the efficiency and scalability that the cloud provides, the security of private patient data is stil...A new era of data access and management has begun with the use of cloud computing in the healthcare industry.Despite the efficiency and scalability that the cloud provides, the security of private patient data is still a majorconcern. Encryption, network security, and adherence to data protection laws are key to ensuring the confidentialityand integrity of healthcare data in the cloud. The computational overhead of encryption technologies could leadto delays in data access and processing rates. To address these challenges, we introduced the Enhanced ParallelMulti-Key Encryption Algorithm (EPM-KEA), aiming to bolster healthcare data security and facilitate the securestorage of critical patient records in the cloud. The data was gathered from two categories Authorization forHospital Admission (AIH) and Authorization for High Complexity Operations.We use Z-score normalization forpreprocessing. The primary goal of implementing encryption techniques is to secure and store massive amountsof data on the cloud. It is feasible that cloud storage alternatives for protecting healthcare data will become morewidely available if security issues can be successfully fixed. As a result of our analysis using specific parametersincluding Execution time (42%), Encryption time (45%), Decryption time (40%), Security level (97%), and Energyconsumption (53%), the system demonstrated favorable performance when compared to the traditional method.This suggests that by addressing these security concerns, there is the potential for broader accessibility to cloudstorage solutions for safeguarding healthcare data.展开更多
This paper presents a reasonable gridding-parameters extraction method for setting the optimal interpolation nodes in the gridding of scattered observed data. The method can extract optimized gridding parameters based...This paper presents a reasonable gridding-parameters extraction method for setting the optimal interpolation nodes in the gridding of scattered observed data. The method can extract optimized gridding parameters based on the distribution of features in raw data. Modeling analysis proves that distortion caused by gridding can be greatly reduced when using such parameters. We also present some improved technical measures that use human- machine interaction and multi-thread parallel technology to solve inadequacies in traditional gridding software. On the basis of these methods, we have developed software that can be used to grid scattered data using a graphic interface. Finally, a comparison of different gridding parameters on field magnetic data from Ji Lin Province, North China demonstrates the superiority of the proposed method in eliminating the distortions and enhancing gridding efficiency.展开更多
基金supported by the National Natural Science Eoundation of China under Grant No.40221503the China National Key Programme for Development Basic Sciences (Abbreviation:973 Project,Grant No.G1999032801)
文摘The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. Verified by radiosonde, including GPS/MET observations into the analysis makes an overall improvement to the analysis variables of temperature, winds, and water vapor. However, the variational model with the ray-tracing method is quite expensive for numerical weather prediction and climate research. For example, about 4 000 GPS/MET refraction angles need to be assimilated to produce an ideal global analysis. Just one iteration of minimization will take more than 24 hours CPU time on the NCEP's Cray C90 computer. Although efforts have been taken to reduce the computational cost, it is still prohibitive for operational data assimilation. In this paper, a parallel version of the three-dimensional variational data assimilation model of GPS/MET occultation measurement suitable for massive parallel processors architectures is developed. The divide-and-conquer strategy is used to achieve parallelism and is implemented by message passing. The authors present the principles for the code's design and examine the performance on the state-of-the-art parallel computers in China. The results show that this parallel model scales favorably as the number of processors is increased. With the Memory-IO technique implemented by the author, the wall clock time per iteration used for assimilating 1420 refraction angles is reduced from 45 s to 12 s using 1420 processors. This suggests that the new parallelized code has the potential to be useful in numerical weather prediction (NWP) and climate studies.
基金Funded by the National 863 Program of China (No. 2005AA113150), and the National Natural Science Foundation of China (No.40701158).
文摘A novel Hilbert-curve is introduced for parallel spatial data partitioning, with consideration of the huge-amount property of spatial information and the variable-length characteristic of vector data items. Based on the improved Hilbert curve, the algorithm can be designed to achieve almost-uniform spatial data partitioning among multiple disks in parallel spatial databases. Thus, the phenomenon of data imbalance can be significantly avoided and search and query efficiency can be enhanced.
基金support by the China Postdoctoral Science Foundation(2017M621151)Northeastern University Postdoctoral Science Foundation(20180313)+1 种基金the Fundamental Research Funds for Central Universities(N180104020)NSFCShandong Joint Fund of the National Natural Science Foundation of China(U1806208)
文摘The density inversion of gravity gradiometry data has attracted considerable attention;however,in large datasets,the multiplicity and low depth resolution as well as efficiency are constrained by time and computer memory requirements.To solve these problems,we improve the reweighting focusing inversion and probability tomography inversion with joint multiple tensors and prior information constraints,and assess the inversion results,computing efficiency,and dataset size.A Message Passing Interface(MPI)-Open Multi-Processing(OpenMP)-Computed Unified Device Architecture(CUDA)multilevel hybrid parallel inversion,named Hybrinv for short,is proposed.Using model and real data from the Vinton Dome,we confirm that Hybrinv can be used to compute the density distribution.For data size of 100×100×20,the hybrid parallel algorithm is fast and based on the run time and scalability we infer that it can be used to process the large-scale data.
文摘The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic parts to obtain more variational information.A model generated from a topographic surface database is more appropriate to represent gradiometric effects derived from near-surface mass,as other kinds of data can hardly reach the spatial resolution requirement.The rectangle prism method,namely an analytic integration of Newtonian potential integrals,is a reliable and commonly used approach to modeling gravity gradient,whereas its computing efficiency is extremely low.A modified rectangle prism method and a graphical processing unit(GPU)parallel algorithm were proposed to speed up the modeling process.The modified method avoided massive redundant computations by deforming formulas according to the symmetries of prisms’integral regions,and the proposed algorithm parallelized this method’s computing process.The parallel algorithm was compared with a conventional serial algorithm using 100 elevation data in two topographic areas(rough and moderate terrain).Modeling differences between the two algorithms were less than 0.1 E,which is attributed to precision differences between single-precision and double-precision float numbers.The parallel algorithm showed computational efficiency approximately 200 times higher than the serial algorithm in experiments,demonstrating its effective speeding up in the modeling process.Further analysis indicates that both the modified method and computational parallelism through GPU contributed to the proposed algorithm’s performances in experiments.
基金Supported by Foundation of High Technology Pro-ject of Jiangsu (BG2004034) , Foundation of Graduate Creative Pro-gramof Jiangsu (xm04-36)
文摘This paper focuses on the parallel aggregation processing of data streams based on the shared-nothing architecture. A novel granularity-aware parallel aggregating model is proposed. It employs parallel sampling and linear regression to describe the characteristics of the data quantity in the query window in order to determine the partition granularity of tuples, and utilizes equal depth histogram to implement partitio ning. This method can avoid data skew and reduce communi cation cost. The experiment results on both synthetic data and actual data prove that the proposed method is efficient, practical and suitable for time-varying data streams processing.
文摘In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.
文摘In this paper, the high-level knowledge of financial data modeled by ordinary differential equations (ODEs) is discovered in dynamic data by using an asynchronous parallel evolutionary modeling algorithm (APHEMA). A numerical example of Nasdaq index analysis is used to demonstrate the potential of APHEMA. The results show that the dynamic models automatically discovered in dynamic data by computer can be used to predict the financial trends.
文摘We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel query optimization, transaction processing system and parallel access method in detail.
基金Supported by Project of National Natural Science Foundation(No.41874134)
文摘Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes the application of GPU parallel processing technology to the focusing inversion method, aiming at improving the inversion accuracy while speeding up calculation and reducing the memory consumption, thus obtaining the fast and reliable inversion results for large complex model. In this paper, equivalent storage of geometric trellis is used to calculate the sensitivity matrix, and the inversion is based on GPU parallel computing technology. The parallel computing program that is optimized by reducing data transfer, access restrictions and instruction restrictions as well as latency hiding greatly reduces the memory usage, speeds up the calculation, and makes the fast inversion of large models possible. By comparing and analyzing the computing speed of traditional single thread CPU method and CUDA-based GPU parallel technology, the excellent acceleration performance of GPU parallel computing is verified, which provides ideas for practical application of some theoretical inversion methods restricted by computing speed and computer memory. The model test verifies that the focusing inversion method can overcome the problem of severe skin effect and ambiguity of geological body boundary. Moreover, the increase of the model cells and inversion data can more clearly depict the boundary position of the abnormal body and delineate its specific shape.
文摘. This paper conducts the analysis on the data mining algorithm implementation and its application in parallel cloud system based on C++. With the increase in the number of the cloud computing platform developers, with the use of cloud computing platform to support the growth of the number of Internet users, the system is also the proportion of log data growth. At present applies in the colony environment many is the news transmission model. In takes in the rest transmission model, between each concurrent execution part exchanges the information, and the coordinated step and the control execution through the transmission news. As for the C++ in the data mining applications, it should ? rstly hold the following features. Parallel communication and serial communication are two basic ways of general communication. Under this basis, this paper proposes the novel perspective on the data mining algorithm implementation and its application in parallel cloud system based on C++. The later research will be focused on the code based implementation.
文摘Since Multimode data is composed of many modes and their complex relationships,it cannot be retrieved or mined effectively by utilizing traditional analysis and processing techniques for single mode data.To address the challenges,we design and implement a graph-based storage and parallel loading system aimed at multimode medical image data.The system is a framework designed to flexibly store and rapidly load these multimode data.Specifically,the system utilizes the Mode Network to model the modes and their relationships in multimode medical image data and the graph database to store the data with a parallel loading technique.
文摘An accurate numerical algorithm for three-line fault involving different phases from each of two-parallel lines is presented. It is based on one-terminal voltage and current data. The loop and nodel equations comparing faulted phase to non-faulted phase of two-parallel lines are introduced in the fault location estimation modal, in which the faulted impedance of remote end is not involved. The effect of load flow and fault resistance on the accuracy of fault location are effectively eliminated, therefore an accurate algorithm of locating fault is derived. The algorithm is demonstrated by digital computer simulations and the results show that errors in locating fault are less than 1%.
文摘A new era of data access and management has begun with the use of cloud computing in the healthcare industry.Despite the efficiency and scalability that the cloud provides, the security of private patient data is still a majorconcern. Encryption, network security, and adherence to data protection laws are key to ensuring the confidentialityand integrity of healthcare data in the cloud. The computational overhead of encryption technologies could leadto delays in data access and processing rates. To address these challenges, we introduced the Enhanced ParallelMulti-Key Encryption Algorithm (EPM-KEA), aiming to bolster healthcare data security and facilitate the securestorage of critical patient records in the cloud. The data was gathered from two categories Authorization forHospital Admission (AIH) and Authorization for High Complexity Operations.We use Z-score normalization forpreprocessing. The primary goal of implementing encryption techniques is to secure and store massive amountsof data on the cloud. It is feasible that cloud storage alternatives for protecting healthcare data will become morewidely available if security issues can be successfully fixed. As a result of our analysis using specific parametersincluding Execution time (42%), Encryption time (45%), Decryption time (40%), Security level (97%), and Energyconsumption (53%), the system demonstrated favorable performance when compared to the traditional method.This suggests that by addressing these security concerns, there is the potential for broader accessibility to cloudstorage solutions for safeguarding healthcare data.
基金partly supported by the Public Geological Survey Project(No.201011039)the National High Technology Research and Development Project of China(No.2007AA06Z134)the 111 Project under the Ministry of Education and the State Administration of Foreign Experts Affairs,China(No.B07011)
文摘This paper presents a reasonable gridding-parameters extraction method for setting the optimal interpolation nodes in the gridding of scattered observed data. The method can extract optimized gridding parameters based on the distribution of features in raw data. Modeling analysis proves that distortion caused by gridding can be greatly reduced when using such parameters. We also present some improved technical measures that use human- machine interaction and multi-thread parallel technology to solve inadequacies in traditional gridding software. On the basis of these methods, we have developed software that can be used to grid scattered data using a graphic interface. Finally, a comparison of different gridding parameters on field magnetic data from Ji Lin Province, North China demonstrates the superiority of the proposed method in eliminating the distortions and enhancing gridding efficiency.