The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o...The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.展开更多
An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sp...An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sparse Feature Vector', thus reduces the data scaleenormously, and can get the clustering result with only one data scan. Both theoretical analysis andempirical tests showed that CABOSFV is of low computational complexity. The algorithm findsclusters in high dimensional large datasets efficiently and handles noise effectively.展开更多
The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities...The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction.展开更多
Dimensionality reduction and data visualization are useful and important processes in pattern recognition. Many techniques have been developed in the recent years. The self-organizing map (SOM) can be an efficient m...Dimensionality reduction and data visualization are useful and important processes in pattern recognition. Many techniques have been developed in the recent years. The self-organizing map (SOM) can be an efficient method for this purpose. This paper reviews recent advances in this area and related approaches such as multidimensional scaling (MDS), nonlinear PC A, principal manifolds, as well as the connections of the SOM and its recent variant, the visualization induced SOM (ViSOM), with these approaches. The SOM is shown to produce a quantized, qualitative scaling and while the ViSOM a quantitative or metric scaling and approximates principal curve/surface. The SOM can also be regarded as a generalized MDS to relate two metric spaces by forming a topological mapping between them. The relationships among various recently proposed techniques such as ViSOM, Isomap, LLE, and eigenmap are discussed and compared.展开更多
Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data ...Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667.展开更多
Seismic data reconstruction is an essential and yet fundamental step in seismic data processing workflow,which is of profound significance to improve migration imaging quality,multiple suppression effect,and seismic i...Seismic data reconstruction is an essential and yet fundamental step in seismic data processing workflow,which is of profound significance to improve migration imaging quality,multiple suppression effect,and seismic inversion accuracy.Regularization methods play a central role in solving the underdetermined inverse problem of seismic data reconstruction.In this paper,a novel regularization approach is proposed,the low dimensional manifold model(LDMM),for reconstructing the missing seismic data.Our work relies on the fact that seismic patches always occupy a low dimensional manifold.Specifically,we exploit the dimension of the seismic patches manifold as a regularization term in the reconstruction problem,and reconstruct the missing seismic data by enforcing low dimensionality on this manifold.The crucial procedure of the proposed method is to solve the dimension of the patches manifold.Toward this,we adopt an efficient dimensionality calculation method based on low-rank approximation,which provides a reliable safeguard to enforce the constraints in the reconstruction process.Numerical experiments performed on synthetic and field seismic data demonstrate that,compared with the curvelet-based sparsity-promoting L1-norm minimization method and the multichannel singular spectrum analysis method,the proposed method obtains state-of-the-art reconstruction results.展开更多
This study examines the performance of coupling the deterministic four-dimensional variational assimilation system (4DVAR) with an ensemble Kalman filter (EnKF) to produce a superior hybrid approach for data assim...This study examines the performance of coupling the deterministic four-dimensional variational assimilation system (4DVAR) with an ensemble Kalman filter (EnKF) to produce a superior hybrid approach for data assimilation. The coupled assimilation scheme (E4DVAR) benefits from using the state-dependent uncertainty provided by EnKF while taking advantage of 4DVAR in preventing filter divergence: the 4DVAR analysis produces posterior maximum likelihood solutions through minimization of a cost function about which the ensemble perturbations are transformed, and the resulting ensemble analysis can be propagated forward both for the next assimilation cycle and as a basis for ensemble forecasting. The feasibility and effectiveness of this coupled approach are demonstrated in an idealized model with simulated observations. It is found that the E4DVAR is capable of outperforming both 4DVAR and the EnKF under both perfect- and imperfect-model scenarios. The performance of the coupled scheme is also less sensitive to either the ensemble size or the assimilation window length than those for standard EnKF or 4DVAR implementations.展开更多
In order to solve the so-called "bull-eye" problem caused by using a simple bilinear interpolation as an observational mapping operator in the cost function in the multigrid three-dimensional variational (3DVAR) d...In order to solve the so-called "bull-eye" problem caused by using a simple bilinear interpolation as an observational mapping operator in the cost function in the multigrid three-dimensional variational (3DVAR) data assimilation scheme, a smoothing term, equivalent to a penalty term, is introduced into the cost function to serve as a means of troubleshooting. A theoretical analysis is first performed to figure out what on earth results in the issue of "bull-eye", and then the meaning of such smoothing term is elucidated and the uniqueness of solution of the multigrid 3DVAR with the smoothing term added is discussed through the theoretical deduction for one-dimensional (1D) case, and two idealized data assimilation experiments (one- and two-dimensional (2D) cases). By exploring the relationship between the smoothing term and the recursive filter theoretically and practically, it is revealed why satisfied analysis results can be achieved by using such proposed solution for the issue of the multigrid 3DVAR.展开更多
The research work has been seldom done about cloverleaf junction expression in a 3-dimensional city model (3DCM). The main reason is that the cloverleaf junction is often in a complex and enormous construction. Its ma...The research work has been seldom done about cloverleaf junction expression in a 3-dimensional city model (3DCM). The main reason is that the cloverleaf junction is often in a complex and enormous construction. Its main body is bestraddle in air,and has aerial intersections between its parts. This complex feature made cloverleaf junction quite different from buildings and terrain, therefore, it is difficult to express this kind of spatial objects in the same way as for buildings and terrain. In this paper,authors analyze spatial characteristics of cloverleaf junction, propose an all-constraint points TIN algorithm to partition cloverleaf junction road surface, and develop a method to visualize cloverleaf junction road surface using TIN. In order to manage cloverleaf junction data efficiently, the authors also analyzed the mechanism of 3DCM data management, extended BLOB type in relational database, and combined R-tree index to manage 3D spatial data. Based on this extension, an appropriate data展开更多
A four-dimensional variational (4D-Var) data assimilation method is implemented in an improved intermediate coupled model (ICM) of the tropical Pacific. A twin experiment is designed to evaluate the impact of the ...A four-dimensional variational (4D-Var) data assimilation method is implemented in an improved intermediate coupled model (ICM) of the tropical Pacific. A twin experiment is designed to evaluate the impact of the 4D-Var data assimilation algorithm on ENSO analysis and prediction based on the ICM. The model error is assumed to arise only from the parameter uncertainty. The "observation" of the SST anomaly, which is sampled from a "truth" model simulation that takes default parameter values and has Gaussian noise added, is directly assimilated into the assimilation model with its parameters set erroneously. Results show that 4D-Var effectively reduces the error of ENSO analysis and therefore improves the prediction skill of ENSO events compared with the non-assimilation case. These results provide a promising way for the ICM to achieve better real-time ENSO prediction.展开更多
The key to develop 3-D GISs is the study on 3-D data model and data structure. Some of the data models and data structures have been presented by scholars. Because of the complexity of 3-D spatial phenomenon, there ar...The key to develop 3-D GISs is the study on 3-D data model and data structure. Some of the data models and data structures have been presented by scholars. Because of the complexity of 3-D spatial phenomenon, there are no perfect data structures that can describe all spatial entities. Every data structure has its own advantages and disadvantages. It is difficult to design a single data structure to meet different needs. The important subject in the3-D data models is developing a data model that has integrated vector and raster data structures. A special 3-D spatial data model based on distributing features of spatial entities should be designed. We took the geological exploration engineering as the research background and designed an integrated data model whose data structures integrats vector and raster data byadopting object-oriented technique. Research achievements are presented in this paper.展开更多
Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approac...Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams.展开更多
To improve the effectiveness of dam safety monitoring database systems, the development process of a multi-dimensional conceptual data model was analyzed and a logic design wasachieved in multi-dimensional database mo...To improve the effectiveness of dam safety monitoring database systems, the development process of a multi-dimensional conceptual data model was analyzed and a logic design wasachieved in multi-dimensional database mode. The optimal data model was confirmed by identifying data objects, defining relations and reviewing entities. The conversion of relations among entities to external keys and entities and physical attributes to tables and fields was interpreted completely. On this basis, a multi-dimensional database that reflects the management and analysis of a dam safety monitoring system on monitoring data information has been established, for which factual tables and dimensional tables have been designed. Finally, based on service design and user interface design, the dam safety monitoring system has been developed with Delphi as the development tool. This development project shows that the multi-dimensional database can simplify the development process and minimize hidden dangers in the database structure design. It is superior to other dam safety monitoring system development models and can provide a new research direction for system developers.展开更多
The curse of dimensionality refers to the problem o increased sparsity and computational complexity when dealing with high-dimensional data.In recent years,the types and vari ables of industrial data have increased si...The curse of dimensionality refers to the problem o increased sparsity and computational complexity when dealing with high-dimensional data.In recent years,the types and vari ables of industrial data have increased significantly,making data driven models more challenging to develop.To address this prob lem,data augmentation technology has been introduced as an effective tool to solve the sparsity problem of high-dimensiona industrial data.This paper systematically explores and discusses the necessity,feasibility,and effectiveness of augmented indus trial data-driven modeling in the context of the curse of dimen sionality and virtual big data.Then,the process of data augmen tation modeling is analyzed,and the concept of data boosting augmentation is proposed.The data boosting augmentation involves designing the reliability weight and actual-virtual weigh functions,and developing a double weighted partial least squares model to optimize the three stages of data generation,data fusion and modeling.This approach significantly improves the inter pretability,effectiveness,and practicality of data augmentation in the industrial modeling.Finally,the proposed method is verified using practical examples of fault diagnosis systems and virtua measurement systems in the industry.The results demonstrate the effectiveness of the proposed approach in improving the accu racy and robustness of data-driven models,making them more suitable for real-world industrial applications.展开更多
The question of how to choose a copula model that best fits a given dataset is a predominant limitation of the copula approach, and the present study aims to investigate the techniques of goodness-of-fit tests for mul...The question of how to choose a copula model that best fits a given dataset is a predominant limitation of the copula approach, and the present study aims to investigate the techniques of goodness-of-fit tests for multi-dimensional copulas. A goodness-of-fit test based on Rosenblatt's transformation was mathematically expanded from two dimensions to three dimensions and procedures of a bootstrap version of the test were provided. Through stochastic copula simulation, an empirical application of historical drought data at the Lintong Gauge Station shows that the goodness-of-fit tests perform well, revealing that both trivariate Gaussian and Student t copulas are acceptable for modeling the dependence structures of the observed drought duration, severity, and peak. The goodness-of-fit tests for multi-dimensional copulas can provide further support and help a lot in the potential applications of a wider range of copulas to describe the associations of correlated hydrological variables. However, for the application of copulas with the number of dimensions larger than three, more complicated computational efforts as well as exploration and parameterization of corresponding copulas are required.展开更多
Multidimensional data query has been gaining much interest in database research communities in recent years, yet many of the existing studies focus mainly on ten tralized systems. A solution to querying in Peer-to-Pee...Multidimensional data query has been gaining much interest in database research communities in recent years, yet many of the existing studies focus mainly on ten tralized systems. A solution to querying in Peer-to-Peer(P2P) environment was proposed to achieve both low processing cost in terms of the number of peers accessed and search messages and balanced query loads among peers. The system is based on a balanced tree structured P2P network. By partitioning the query space intelligently, the amount of query forwarding is effectively controlled, and the number of peers involved and search messages are also limited. Dynamic load balancing can be achieved during space partitioning and query resolving. Extensive experiments confirm the effectiveness and scalability of our algorithms on P2P networks.展开更多
Viticulturists traditionally have a keen interest in studying the relationship between the biochemistry of grapevines’ leaves/petioles and their associated spectral reflectance in order to understand the fruit ripeni...Viticulturists traditionally have a keen interest in studying the relationship between the biochemistry of grapevines’ leaves/petioles and their associated spectral reflectance in order to understand the fruit ripening rate, water status, nutrient levels, and disease risk. In this paper, we implement imaging spectroscopy (hyperspectral) reflectance data, for the reflective 330 - 2510 nm wavelength region (986 total spectral bands), to assess vineyard nutrient status;this constitutes a high dimensional dataset with a covariance matrix that is ill-conditioned. The identification of the variables (wavelength bands) that contribute useful information for nutrient assessment and prediction, plays a pivotal role in multivariate statistical modeling. In recent years, researchers have successfully developed many continuous, nearly unbiased, sparse and accurate variable selection methods to overcome this problem. This paper compares four regularized and one functional regression methods: Elastic Net, Multi-Step Adaptive Elastic Net, Minimax Concave Penalty, iterative Sure Independence Screening, and Functional Data Analysis for wavelength variable selection. Thereafter, the predictive performance of these regularized sparse models is enhanced using the stepwise regression. This comparative study of regression methods using a high-dimensional and highly correlated grapevine hyperspectral dataset revealed that the performance of Elastic Net for variable selection yields the best predictive ability.展开更多
In order to realize visualization of three-dimensional data field (TDDF) in instrument, two methods of visualization of TDDF and the usual manner of quick graphic and image processing are analyzed. And how to use Op...In order to realize visualization of three-dimensional data field (TDDF) in instrument, two methods of visualization of TDDF and the usual manner of quick graphic and image processing are analyzed. And how to use OpenGL technique and the characteristic of analyzed data to construct a TDDF, the ways of reality processing and interactive processing are described. Then the medium geometric element and a related realistic model are constructed by means of the first algorithm. Models obtained for attaching the third dimension in three-dimensional data field are presented. An example for TDDF realization of machine measuring is provided. The analysis of resultant graphic indicates that the three-dimensional graphics built by the method developed is featured by good reality, fast processing and strong interaction展开更多
Improving imaging quality of cone-beam CT under large cone angle scan has been an important area of CT imaging research. Considering the idea of conjugate rays and making up missing data, we propose a three-dimensiona...Improving imaging quality of cone-beam CT under large cone angle scan has been an important area of CT imaging research. Considering the idea of conjugate rays and making up missing data, we propose a three-dimensional(3D) weighting reconstruction algorithm for cone-beam CT. The 3D weighting function is added in the back-projection process to reduce the axial density drop and improve the accuracy of FDK algorithm. Having a simple structure, the algorithm can be implemented easily without rebinning the native cone-beam data into coneparallel beam data. Performance of the algorithm is evaluated using two computer simulations and a real industrial component, and the results show that the algorithm achieves better performance in reduction of axial intensity drop artifacts and has a wide range of application.展开更多
文摘The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.
文摘An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sparse Feature Vector', thus reduces the data scaleenormously, and can get the clustering result with only one data scan. Both theoretical analysis andempirical tests showed that CABOSFV is of low computational complexity. The algorithm findsclusters in high dimensional large datasets efficiently and handles noise effectively.
基金Supported by the National Natural Science Foundation of China(No.61502475)the Importation and Development of High-Caliber Talents Project of the Beijing Municipal Institutions(No.CIT&TCD201504039)
文摘The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction.
文摘Dimensionality reduction and data visualization are useful and important processes in pattern recognition. Many techniques have been developed in the recent years. The self-organizing map (SOM) can be an efficient method for this purpose. This paper reviews recent advances in this area and related approaches such as multidimensional scaling (MDS), nonlinear PC A, principal manifolds, as well as the connections of the SOM and its recent variant, the visualization induced SOM (ViSOM), with these approaches. The SOM is shown to produce a quantized, qualitative scaling and while the ViSOM a quantitative or metric scaling and approximates principal curve/surface. The SOM can also be regarded as a generalized MDS to relate two metric spaces by forming a topological mapping between them. The relationships among various recently proposed techniques such as ViSOM, Isomap, LLE, and eigenmap are discussed and compared.
基金Project(RDF 11-02-03)supported by the Research Development Fund of XJTLU,China
文摘Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667.
基金supported by National Natural Science Foundation of China(Grant No.41874146 and No.42030103)Postgraduate Innovation Project of China University of Petroleum(East China)(No.YCX2021012)
文摘Seismic data reconstruction is an essential and yet fundamental step in seismic data processing workflow,which is of profound significance to improve migration imaging quality,multiple suppression effect,and seismic inversion accuracy.Regularization methods play a central role in solving the underdetermined inverse problem of seismic data reconstruction.In this paper,a novel regularization approach is proposed,the low dimensional manifold model(LDMM),for reconstructing the missing seismic data.Our work relies on the fact that seismic patches always occupy a low dimensional manifold.Specifically,we exploit the dimension of the seismic patches manifold as a regularization term in the reconstruction problem,and reconstruct the missing seismic data by enforcing low dimensionality on this manifold.The crucial procedure of the proposed method is to solve the dimension of the patches manifold.Toward this,we adopt an efficient dimensionality calculation method based on low-rank approximation,which provides a reliable safeguard to enforce the constraints in the reconstruction process.Numerical experiments performed on synthetic and field seismic data demonstrate that,compared with the curvelet-based sparsity-promoting L1-norm minimization method and the multichannel singular spectrum analysis method,the proposed method obtains state-of-the-art reconstruction results.
基金sponsored by the U.S. National Science Foundation (Grant No.ATM0205599)the U.S. Offce of Navy Research under Grant N000140410471Dr. James A. Hansen was partially supported by US Offce of Naval Research (Grant No. N00014-06-1-0500)
文摘This study examines the performance of coupling the deterministic four-dimensional variational assimilation system (4DVAR) with an ensemble Kalman filter (EnKF) to produce a superior hybrid approach for data assimilation. The coupled assimilation scheme (E4DVAR) benefits from using the state-dependent uncertainty provided by EnKF while taking advantage of 4DVAR in preventing filter divergence: the 4DVAR analysis produces posterior maximum likelihood solutions through minimization of a cost function about which the ensemble perturbations are transformed, and the resulting ensemble analysis can be propagated forward both for the next assimilation cycle and as a basis for ensemble forecasting. The feasibility and effectiveness of this coupled approach are demonstrated in an idealized model with simulated observations. It is found that the E4DVAR is capable of outperforming both 4DVAR and the EnKF under both perfect- and imperfect-model scenarios. The performance of the coupled scheme is also less sensitive to either the ensemble size or the assimilation window length than those for standard EnKF or 4DVAR implementations.
基金The National Basic Research Program of China under contract No. 2013CB430304the National High-Tech R&D Program of China under contract No. 2013AA09A505the National Natural Science Foundation of China under contract Nos 41030854,40906015,40906016,41106005 and 41176003
文摘In order to solve the so-called "bull-eye" problem caused by using a simple bilinear interpolation as an observational mapping operator in the cost function in the multigrid three-dimensional variational (3DVAR) data assimilation scheme, a smoothing term, equivalent to a penalty term, is introduced into the cost function to serve as a means of troubleshooting. A theoretical analysis is first performed to figure out what on earth results in the issue of "bull-eye", and then the meaning of such smoothing term is elucidated and the uniqueness of solution of the multigrid 3DVAR with the smoothing term added is discussed through the theoretical deduction for one-dimensional (1D) case, and two idealized data assimilation experiments (one- and two-dimensional (2D) cases). By exploring the relationship between the smoothing term and the recursive filter theoretically and practically, it is revealed why satisfied analysis results can be achieved by using such proposed solution for the issue of the multigrid 3DVAR.
文摘The research work has been seldom done about cloverleaf junction expression in a 3-dimensional city model (3DCM). The main reason is that the cloverleaf junction is often in a complex and enormous construction. Its main body is bestraddle in air,and has aerial intersections between its parts. This complex feature made cloverleaf junction quite different from buildings and terrain, therefore, it is difficult to express this kind of spatial objects in the same way as for buildings and terrain. In this paper,authors analyze spatial characteristics of cloverleaf junction, propose an all-constraint points TIN algorithm to partition cloverleaf junction road surface, and develop a method to visualize cloverleaf junction road surface using TIN. In order to manage cloverleaf junction data efficiently, the authors also analyzed the mechanism of 3DCM data management, extended BLOB type in relational database, and combined R-tree index to manage 3D spatial data. Based on this extension, an appropriate data
基金supported by the National Natural Science Foundation of China(Grant Nos.41490644,41475101 and 41421005)the CAS Strategic Priority Project(the Western Pacific Ocean System+2 种基金Project Nos.XDA11010105,XDA11020306 and XDA11010301)the NSFC-Shandong Joint Fund for Marine Science Research Centers(Grant No.U1406401)the NSFC Innovative Group Grant(Project No.41421005)
文摘A four-dimensional variational (4D-Var) data assimilation method is implemented in an improved intermediate coupled model (ICM) of the tropical Pacific. A twin experiment is designed to evaluate the impact of the 4D-Var data assimilation algorithm on ENSO analysis and prediction based on the ICM. The model error is assumed to arise only from the parameter uncertainty. The "observation" of the SST anomaly, which is sampled from a "truth" model simulation that takes default parameter values and has Gaussian noise added, is directly assimilated into the assimilation model with its parameters set erroneously. Results show that 4D-Var effectively reduces the error of ENSO analysis and therefore improves the prediction skill of ENSO events compared with the non-assimilation case. These results provide a promising way for the ICM to achieve better real-time ENSO prediction.
基金Project supported by the National Outstanding Youth Researchers Foundation (No.49525101)the Opening Research Foundation from LIESMARS(WKL(96)0302)
文摘The key to develop 3-D GISs is the study on 3-D data model and data structure. Some of the data models and data structures have been presented by scholars. Because of the complexity of 3-D spatial phenomenon, there are no perfect data structures that can describe all spatial entities. Every data structure has its own advantages and disadvantages. It is difficult to design a single data structure to meet different needs. The important subject in the3-D data models is developing a data model that has integrated vector and raster data structures. A special 3-D spatial data model based on distributing features of spatial entities should be designed. We took the geological exploration engineering as the research background and designed an integrated data model whose data structures integrats vector and raster data byadopting object-oriented technique. Research achievements are presented in this paper.
文摘Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams.
基金supported by the National Natural Science Foundation of China (Grant No. 50539010, 50539110, 50579010, 50539030 and 50809025)
文摘To improve the effectiveness of dam safety monitoring database systems, the development process of a multi-dimensional conceptual data model was analyzed and a logic design wasachieved in multi-dimensional database mode. The optimal data model was confirmed by identifying data objects, defining relations and reviewing entities. The conversion of relations among entities to external keys and entities and physical attributes to tables and fields was interpreted completely. On this basis, a multi-dimensional database that reflects the management and analysis of a dam safety monitoring system on monitoring data information has been established, for which factual tables and dimensional tables have been designed. Finally, based on service design and user interface design, the dam safety monitoring system has been developed with Delphi as the development tool. This development project shows that the multi-dimensional database can simplify the development process and minimize hidden dangers in the database structure design. It is superior to other dam safety monitoring system development models and can provide a new research direction for system developers.
基金supported in part by the National Natural Science Foundation of China(NSFC)(92167106,61833014)Key Research and Development Program of Zhejiang Province(2022C01206)。
文摘The curse of dimensionality refers to the problem o increased sparsity and computational complexity when dealing with high-dimensional data.In recent years,the types and vari ables of industrial data have increased significantly,making data driven models more challenging to develop.To address this prob lem,data augmentation technology has been introduced as an effective tool to solve the sparsity problem of high-dimensiona industrial data.This paper systematically explores and discusses the necessity,feasibility,and effectiveness of augmented indus trial data-driven modeling in the context of the curse of dimen sionality and virtual big data.Then,the process of data augmen tation modeling is analyzed,and the concept of data boosting augmentation is proposed.The data boosting augmentation involves designing the reliability weight and actual-virtual weigh functions,and developing a double weighted partial least squares model to optimize the three stages of data generation,data fusion and modeling.This approach significantly improves the inter pretability,effectiveness,and practicality of data augmentation in the industrial modeling.Finally,the proposed method is verified using practical examples of fault diagnosis systems and virtua measurement systems in the industry.The results demonstrate the effectiveness of the proposed approach in improving the accu racy and robustness of data-driven models,making them more suitable for real-world industrial applications.
基金supported by the Program of Introducing Talents of Disciplines to Universities of the Ministry of Education and State Administration of the Foreign Experts Affairs of China (the 111 Project, Grant No.B08048)the Special Basic Research Fund for Methodology in Hydrology of the Ministry of Sciences and Technology of China (Grant No. 2011IM011000)
文摘The question of how to choose a copula model that best fits a given dataset is a predominant limitation of the copula approach, and the present study aims to investigate the techniques of goodness-of-fit tests for multi-dimensional copulas. A goodness-of-fit test based on Rosenblatt's transformation was mathematically expanded from two dimensions to three dimensions and procedures of a bootstrap version of the test were provided. Through stochastic copula simulation, an empirical application of historical drought data at the Lintong Gauge Station shows that the goodness-of-fit tests perform well, revealing that both trivariate Gaussian and Student t copulas are acceptable for modeling the dependence structures of the observed drought duration, severity, and peak. The goodness-of-fit tests for multi-dimensional copulas can provide further support and help a lot in the potential applications of a wider range of copulas to describe the associations of correlated hydrological variables. However, for the application of copulas with the number of dimensions larger than three, more complicated computational efforts as well as exploration and parameterization of corresponding copulas are required.
基金Supported by the Natural Science Foundation ofJiangsu Province(BG2004034)
文摘Multidimensional data query has been gaining much interest in database research communities in recent years, yet many of the existing studies focus mainly on ten tralized systems. A solution to querying in Peer-to-Peer(P2P) environment was proposed to achieve both low processing cost in terms of the number of peers accessed and search messages and balanced query loads among peers. The system is based on a balanced tree structured P2P network. By partitioning the query space intelligently, the amount of query forwarding is effectively controlled, and the number of peers involved and search messages are also limited. Dynamic load balancing can be achieved during space partitioning and query resolving. Extensive experiments confirm the effectiveness and scalability of our algorithms on P2P networks.
文摘Viticulturists traditionally have a keen interest in studying the relationship between the biochemistry of grapevines’ leaves/petioles and their associated spectral reflectance in order to understand the fruit ripening rate, water status, nutrient levels, and disease risk. In this paper, we implement imaging spectroscopy (hyperspectral) reflectance data, for the reflective 330 - 2510 nm wavelength region (986 total spectral bands), to assess vineyard nutrient status;this constitutes a high dimensional dataset with a covariance matrix that is ill-conditioned. The identification of the variables (wavelength bands) that contribute useful information for nutrient assessment and prediction, plays a pivotal role in multivariate statistical modeling. In recent years, researchers have successfully developed many continuous, nearly unbiased, sparse and accurate variable selection methods to overcome this problem. This paper compares four regularized and one functional regression methods: Elastic Net, Multi-Step Adaptive Elastic Net, Minimax Concave Penalty, iterative Sure Independence Screening, and Functional Data Analysis for wavelength variable selection. Thereafter, the predictive performance of these regularized sparse models is enhanced using the stepwise regression. This comparative study of regression methods using a high-dimensional and highly correlated grapevine hyperspectral dataset revealed that the performance of Elastic Net for variable selection yields the best predictive ability.
基金This project is supported by National Natural Science Foundation of China (No.50405009)
文摘In order to realize visualization of three-dimensional data field (TDDF) in instrument, two methods of visualization of TDDF and the usual manner of quick graphic and image processing are analyzed. And how to use OpenGL technique and the characteristic of analyzed data to construct a TDDF, the ways of reality processing and interactive processing are described. Then the medium geometric element and a related realistic model are constructed by means of the first algorithm. Models obtained for attaching the third dimension in three-dimensional data field are presented. An example for TDDF realization of machine measuring is provided. The analysis of resultant graphic indicates that the three-dimensional graphics built by the method developed is featured by good reality, fast processing and strong interaction
基金supported by the National Natural Science Foundation of China(Nos.51675437 and 51605389)Aeronautical Science Fund of China(No.2014ZE53059)+2 种基金Natural Science Basic Research Plan in Shaanxi Province of China(No.2016JM5003)Fundamental Research Funds for the Central Universities of China(No.3102014KYJD022)the Graduate Starting Seed Fund of Northwestern Polytechnical University(Nos.Z2016075 and Z2016081)
文摘Improving imaging quality of cone-beam CT under large cone angle scan has been an important area of CT imaging research. Considering the idea of conjugate rays and making up missing data, we propose a three-dimensional(3D) weighting reconstruction algorithm for cone-beam CT. The 3D weighting function is added in the back-projection process to reduce the axial density drop and improve the accuracy of FDK algorithm. Having a simple structure, the algorithm can be implemented easily without rebinning the native cone-beam data into coneparallel beam data. Performance of the algorithm is evaluated using two computer simulations and a real industrial component, and the results show that the algorithm achieves better performance in reduction of axial intensity drop artifacts and has a wide range of application.