An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sp...An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sparse Feature Vector', thus reduces the data scaleenormously, and can get the clustering result with only one data scan. Both theoretical analysis andempirical tests showed that CABOSFV is of low computational complexity. The algorithm findsclusters in high dimensional large datasets efficiently and handles noise effectively.展开更多
Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data ...Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667.展开更多
In this paper, the global controllability for a class of high dimensional polynomial systems has been investigated and a constructive algebraic criterion algorithm for their global controllability has been obtained. B...In this paper, the global controllability for a class of high dimensional polynomial systems has been investigated and a constructive algebraic criterion algorithm for their global controllability has been obtained. By the criterion algorithm, the global controllability can be determined in finite steps of arithmetic operations. The algorithm is imposed on the coefficients of the polynomials only and the analysis technique is based on Sturm Theorem in real algebraic geometry and its modern progress. Finally, the authors will give some examples to show the application of our results.展开更多
To solve the high-dimensionality issue and improve its accuracy in credit risk assessment,a high-dimensionality-trait-driven learning paradigm is proposed for feature extraction and classifier selection.The proposed p...To solve the high-dimensionality issue and improve its accuracy in credit risk assessment,a high-dimensionality-trait-driven learning paradigm is proposed for feature extraction and classifier selection.The proposed paradigm consists of three main stages:categorization of high dimensional data,high-dimensionality-trait-driven feature extraction,and high-dimensionality-trait-driven classifier selection.In the first stage,according to the definition of high-dimensionality and the relationship between sample size and feature dimensions,the high-dimensionality traits of credit dataset are further categorized into two types:100<feature dimensions<sample size,and feature dimensions≥sample size.In the second stage,some typical feature extraction methods are tested regarding the two categories of high dimensionality.In the final stage,four types of classifiers are performed to evaluate credit risk considering different high-dimensionality traits.For the purpose of illustration and verification,credit classification experiments are performed on two publicly available credit risk datasets,and the results show that the proposed high-dimensionality-trait-driven learning paradigm for feature extraction and classifier selection is effective in handling high-dimensional credit classification issues and improving credit classification accuracy relative to the benchmark models listed in this study.展开更多
Markowitz Portfolio theory under-estimates the risk associated with the return of a portfolio in case of high dimensional data. El Karoui mathematically proved this in [1] and suggested improved estimators for unbiase...Markowitz Portfolio theory under-estimates the risk associated with the return of a portfolio in case of high dimensional data. El Karoui mathematically proved this in [1] and suggested improved estimators for unbiased estimation of this risk under specific model assumptions. Norm constrained portfolios have recently been studied to keep the effective dimension low. In this paper we consider three sets of high dimensional data, the stock market prices for three countries, namely US, UK and India. We compare the Markowitz efficient frontier to those obtained by unbiasedness corrections and imposing norm-constraints in these real data scenarios. We also study the out-of-sample performance of the different procedures. We find that the 2-norm constrained portfolio has best overall performance.展开更多
This paper mainly concerns oblique derivative problems for nonlinear nondivergent elliptic equations of second order with measurable coefficients in a multiply connected domain. Under certain condition, we derive a pr...This paper mainly concerns oblique derivative problems for nonlinear nondivergent elliptic equations of second order with measurable coefficients in a multiply connected domain. Under certain condition, we derive a priori estimates of solutions. By using these estimates and the fixed-point theorem, we prove the existence of solutions.展开更多
With the abundance of exceptionally High Dimensional data, feature selection has become an essential element in the Data Mining process. In this paper, we investigate the problem of efficient feature selection for cla...With the abundance of exceptionally High Dimensional data, feature selection has become an essential element in the Data Mining process. In this paper, we investigate the problem of efficient feature selection for classification on High Dimensional datasets. We present a novel filter based approach for feature selection that sorts out the features based on a score and then we measure the performance of four different Data Mining classification algorithms on the resulting data. In the proposed approach, we partition the sorted feature and search the important feature in forward manner as well as in reversed manner, while starting from first and last feature simultaneously in the sorted list. The proposed approach is highly scalable and effective as it parallelizes over both attribute and tuples simultaneously allowing us to evaluate many of potential features for High Dimensional datasets. The newly proposed framework for feature selection is experimentally shown to be very valuable with real and synthetic High Dimensional datasets which improve the precision of selected features. We have also tested it to measure classification accuracy against various feature selection process.展开更多
Two of the main challenges in optimal control are solving problems with state-dependent running costs and developing efficient numerical solvers that are computationally tractable in high dimensions.In this paper,we p...Two of the main challenges in optimal control are solving problems with state-dependent running costs and developing efficient numerical solvers that are computationally tractable in high dimensions.In this paper,we provide analytical solutions to certain optimal control problems whose running cost depends on the state variable and with constraints on the control.We also provide Lax-Oleinik-type representation formulas for the corresponding Hamilton-Jacobi partial differential equations with state-dependent Hamiltonians.Additionally,we present an efficient,grid-free numerical solver based on our representation formulas,which is shown to scale linearly with the state dimension,and thus,to overcome the curse of dimensionality.Using existing optimization methods and the min-plus technique,we extend our numerical solvers to address more general classes of convex and nonconvex initial costs.We demonstrate the capabilities of our numerical solvers using implementations on a central processing unit(CPU)and a field-programmable gate array(FPGA).In several cases,our FPGA implementation obtains over a 10 times speedup compared to the CPU,which demonstrates the promising performance boosts FPGAs can achieve.Our numerical results show that our solvers have the potential to serve as a building block for solving broader classes of high-dimensional optimal control problems in real-time.展开更多
The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o...The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.展开更多
This paper proposes a test procedure for testing the regression coefficients in high dimensional partially linear models based on the F-statistic. In the partially linear model, the authors first estimate the unknown ...This paper proposes a test procedure for testing the regression coefficients in high dimensional partially linear models based on the F-statistic. In the partially linear model, the authors first estimate the unknown nonlinear component by some nonparametric methods and then generalize the F-statistic to test the regression coefficients under some regular conditions. During this procedure, the estimation of the nonlinear component brings much challenge to explore the properties of generalized F-test. The authors obtain some asymptotic properties of the generalized F-test in more general cases,including the asymptotic normality and the power of this test with p/n ∈(0, 1) without normality assumption. The asymptotic result is general and by adding some constraint conditions we can obtain the similar conclusions in high dimensional linear models. Through simulation studies, the authors demonstrate good finite-sample performance of the proposed test in comparison with the theoretical results. The practical utility of our method is illustrated by a real data example.展开更多
Three high dimensional spatial standardization algorithms are used for diffusion tensor image(DTI)registration,and seven kinds of methods are used to evaluate their performances.Firstly,the template used in this paper...Three high dimensional spatial standardization algorithms are used for diffusion tensor image(DTI)registration,and seven kinds of methods are used to evaluate their performances.Firstly,the template used in this paper was obtained by spatial transformation of 16 subjects by means of tensor-based standardization.Then,high dimensional standardization algorithms for diffusion tensor images,including fractional anisotropy(FA)based diffeomorphic registration algorithm,FA based elastic registration algorithm and tensor-based registration algorithm,were performed.Finally,7 kinds of evaluation methods,including normalized standard deviation,dyadic coherence,diffusion cross-correlation,overlap of eigenvalue-eigenvector pairs,Euclidean distance of diffusion tensor,and Euclidean distance of the deviatoric tensor and deviatoric of tensors,were used to qualitatively compare and summarize the above standardization algorithms.Experimental results revealed that the high-dimensional tensor-based standardization algorithms perform well and can maintain the consistency of anatomical structures.展开更多
This paper considers tests for regression coefficients in high dimensional partially linear Models.The authors first use the B-spline method to estimate the unknown smooth function so that it could be linearly express...This paper considers tests for regression coefficients in high dimensional partially linear Models.The authors first use the B-spline method to estimate the unknown smooth function so that it could be linearly expressed.Then,the authors propose an empirical likelihood method to test regression coefficients.The authors derive the asymptotic chi-squared distribution with two degrees of freedom of the proposed test statistics under the null hypothesis.In addition,the method is extended to test with nuisance parameters.Simulations show that the proposed method have a good performance in control of type-I error rate and power.The proposed method is also employed to analyze a data of Skin Cutaneous Melanoma(SKCM).展开更多
Covariance matrix plays an important role in risk management, asset pricing, and portfolio allocation. Covariance matrix estimation becomes challenging when the dimensionality is comparable or much larger than the sam...Covariance matrix plays an important role in risk management, asset pricing, and portfolio allocation. Covariance matrix estimation becomes challenging when the dimensionality is comparable or much larger than the sample size. A widely used approach for reducing dimensionality is based on multi-factor models. Although it has been well studied and quite successful in many applications, the quality of the estimated covariance matrix is often degraded due to a nontrivial amount of missing data in the factor matrix for both technical and cost reasons. Since the factor matrix is only approximately low rank or even has full rank, existing matrix completion algorithms are not applicable. We consider a new matrix completion paradigm using the factor models directly and apply the alternating direction method of multipliers for the recovery. Numerical experiments show that the nuclear-norm matrix completion approaches are not suitable but our proposed models and algorithms are promising.展开更多
This paper aims to develop a new robust U-type test for high dimensional regression coefficients using the estimated U-statistic of order two and refitted cross-validation error variance estimation. It is proved that ...This paper aims to develop a new robust U-type test for high dimensional regression coefficients using the estimated U-statistic of order two and refitted cross-validation error variance estimation. It is proved that the limiting null distribution of the proposed new test is normal under two kinds of ordinary models.We further study the local power of the proposed test and compare with other competitive tests for high dimensional data. The idea of refitted cross-validation approach is utilized to reduce the bias of sample variance in the estimation of the test statistic. Our theoretical results indicate that the proposed test can have even more substantial power gain than the test by Zhong and Chen(2011) when testing a hypothesis with outlying observations and heavy tailed distributions. We assess the finite-sample performance of the proposed test by examining its size and power via Monte Carlo studies. We also illustrate the application of the proposed test by an empirical analysis of a real data example.展开更多
We propose a two-step variable selection procedure for censored quantile regression with high dimensional predictors. To account for censoring data in high dimensional case, we employ effective dimension reduction and...We propose a two-step variable selection procedure for censored quantile regression with high dimensional predictors. To account for censoring data in high dimensional case, we employ effective dimension reduction and the ideas of informative subset idea. Under some regularity conditions, we show that our procedure enjoys the model selection consistency. Simulation study and real data analysis are conducted to evaluate the finite sample performance of the proposed approach.展开更多
The era of big data brings opportunities and challenges to developing new statistical methods and models to evaluate social programs or economic policies or interventions. This paper provides a comprehensive review on...The era of big data brings opportunities and challenges to developing new statistical methods and models to evaluate social programs or economic policies or interventions. This paper provides a comprehensive review on some recent advances in statistical methodologies and models to evaluate programs with high-dimensional data. In particular, four kinds of methods for making valid statistical inferences for treatment effects in high dimensions are addressed. The first one is the so-called doubly robust type estimation, which models the outcome regression and propensity score functions simultaneously. The second one is the covariate balance method to construct the treatment effect estimators. The third one is the sufficient dimension reduction approach for causal inferences. The last one is the machine learning procedure directly or indirectly to make statistical inferences to treatment effect. In such a way, some of these methods and models are closely related to the de-biased Lasso type methods for the regression model with high dimensions in the statistical literature. Finally, some future research topics are also discussed.展开更多
We deal with the boundedness of solutions to a class of fully parabolic quasilinear repulsion chemotaxis systems{ut=∇・(ϕ(u)∇u)+∇・(ψ(u)∇v),(x,t)∈Ω×(0,T),vt=Δv−v+u,(x,t)∈Ω×(0,T),under homogeneous Neumann...We deal with the boundedness of solutions to a class of fully parabolic quasilinear repulsion chemotaxis systems{ut=∇・(ϕ(u)∇u)+∇・(ψ(u)∇v),(x,t)∈Ω×(0,T),vt=Δv−v+u,(x,t)∈Ω×(0,T),under homogeneous Neumann boundary conditions in a smooth bounded domainΩ⊂R^N(N≥3),where 0<ψ(u)≤K(u+1)^a,K1(s+1)^m≤ϕ(s)≤K2(s+1)^m withα,K,K1,K2>0 and m∈R.It is shown that ifα−m<4/N+2,then for any sufficiently smooth initial data,the classical solutions to the system are uniformly-in-time bounded.This extends the known result for the corresponding model with linear diffusion.展开更多
In this paper, we consider the Cauchy problem for systems of quasi-linear wave equations with multiple propagation speeds in spatial dimensions n ≥ 4. The problem when the nonlinearities depend on both the unknown fu...In this paper, we consider the Cauchy problem for systems of quasi-linear wave equations with multiple propagation speeds in spatial dimensions n ≥ 4. The problem when the nonlinearities depend on both the unknown function and their derivatives is studied. Based on some Klainerman- Sideris type weighted estimates and space-time L2 estimates, the results that the almost global existence for space dimensions n = 4 and global existence for n≥ 5 of small amplitude solutions are presented.展开更多
Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approac...Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams.展开更多
Dynamic transient responses of rotating twisted plate under the air-blast loading and step loading respectively considering the geometric nonlinear relationships are investigated using classical shallow shell theory.B...Dynamic transient responses of rotating twisted plate under the air-blast loading and step loading respectively considering the geometric nonlinear relationships are investigated using classical shallow shell theory.By applying energy principle,a novel high dimensional nonlinear dynamic system of the rotating cantilever twisted plate is derived for the first time.The use of variable mode functions by polynomial functions according to the twist angles and geometric of the plate makes it more accurate to describe the dynamic system than that using the classic cantilever beam functions and the free-free beam functions.The comparison researches are carried out between the present results and other literatures to validate present model,formulation and computer process.Equations of motion describing the transient high dimensional nonlinear dynamic response are reduced to a four degree of freedom dynamic system which expressed by out-plane displacement.The effects of twisted angle,stagger angle,rotation speed,load intensity and viscous damping on nonlinear dynamic transient responses of the twisted plate have been investigated.It’s important to note that although the homogeneous and isotropic material is applied here,it might be helpful for laminated composite,functionally graded material as long as the equivalent material parameters are obtained.展开更多
文摘An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sparse Feature Vector', thus reduces the data scaleenormously, and can get the clustering result with only one data scan. Both theoretical analysis andempirical tests showed that CABOSFV is of low computational complexity. The algorithm findsclusters in high dimensional large datasets efficiently and handles noise effectively.
基金Project(RDF 11-02-03)supported by the Research Development Fund of XJTLU,China
文摘Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667.
基金supported by the Natural Science Foundation of China under Grant Nos.60804008,61174048and 11071263the Fundamental Research Funds for the Central Universities and Guangdong Province Key Laboratory of Computational Science at Sun Yat-Sen University
文摘In this paper, the global controllability for a class of high dimensional polynomial systems has been investigated and a constructive algebraic criterion algorithm for their global controllability has been obtained. By the criterion algorithm, the global controllability can be determined in finite steps of arithmetic operations. The algorithm is imposed on the coefficients of the polynomials only and the analysis technique is based on Sturm Theorem in real algebraic geometry and its modern progress. Finally, the authors will give some examples to show the application of our results.
基金This work is partially supported by grants from the Key Program of National Natural Science Foundation of China(NSFC Nos.71631005 and 71731009)the Major Program of the National Social Science Foundation of China(No.19ZDA103).
文摘To solve the high-dimensionality issue and improve its accuracy in credit risk assessment,a high-dimensionality-trait-driven learning paradigm is proposed for feature extraction and classifier selection.The proposed paradigm consists of three main stages:categorization of high dimensional data,high-dimensionality-trait-driven feature extraction,and high-dimensionality-trait-driven classifier selection.In the first stage,according to the definition of high-dimensionality and the relationship between sample size and feature dimensions,the high-dimensionality traits of credit dataset are further categorized into two types:100<feature dimensions<sample size,and feature dimensions≥sample size.In the second stage,some typical feature extraction methods are tested regarding the two categories of high dimensionality.In the final stage,four types of classifiers are performed to evaluate credit risk considering different high-dimensionality traits.For the purpose of illustration and verification,credit classification experiments are performed on two publicly available credit risk datasets,and the results show that the proposed high-dimensionality-trait-driven learning paradigm for feature extraction and classifier selection is effective in handling high-dimensional credit classification issues and improving credit classification accuracy relative to the benchmark models listed in this study.
文摘Markowitz Portfolio theory under-estimates the risk associated with the return of a portfolio in case of high dimensional data. El Karoui mathematically proved this in [1] and suggested improved estimators for unbiased estimation of this risk under specific model assumptions. Norm constrained portfolios have recently been studied to keep the effective dimension low. In this paper we consider three sets of high dimensional data, the stock market prices for three countries, namely US, UK and India. We compare the Markowitz efficient frontier to those obtained by unbiasedness corrections and imposing norm-constraints in these real data scenarios. We also study the out-of-sample performance of the different procedures. We find that the 2-norm constrained portfolio has best overall performance.
文摘This paper mainly concerns oblique derivative problems for nonlinear nondivergent elliptic equations of second order with measurable coefficients in a multiply connected domain. Under certain condition, we derive a priori estimates of solutions. By using these estimates and the fixed-point theorem, we prove the existence of solutions.
文摘With the abundance of exceptionally High Dimensional data, feature selection has become an essential element in the Data Mining process. In this paper, we investigate the problem of efficient feature selection for classification on High Dimensional datasets. We present a novel filter based approach for feature selection that sorts out the features based on a score and then we measure the performance of four different Data Mining classification algorithms on the resulting data. In the proposed approach, we partition the sorted feature and search the important feature in forward manner as well as in reversed manner, while starting from first and last feature simultaneously in the sorted list. The proposed approach is highly scalable and effective as it parallelizes over both attribute and tuples simultaneously allowing us to evaluate many of potential features for High Dimensional datasets. The newly proposed framework for feature selection is experimentally shown to be very valuable with real and synthetic High Dimensional datasets which improve the precision of selected features. We have also tested it to measure classification accuracy against various feature selection process.
基金supported by the DOE-MMICS SEA-CROGS DE-SC0023191 and the AFOSR MURI FA9550-20-1-0358supported by the SMART Scholarship,which is funded by the USD/R&E(The Under Secretary of Defense-Research and Engineering),National Defense Education Program(NDEP)/BA-1,Basic Research.
文摘Two of the main challenges in optimal control are solving problems with state-dependent running costs and developing efficient numerical solvers that are computationally tractable in high dimensions.In this paper,we provide analytical solutions to certain optimal control problems whose running cost depends on the state variable and with constraints on the control.We also provide Lax-Oleinik-type representation formulas for the corresponding Hamilton-Jacobi partial differential equations with state-dependent Hamiltonians.Additionally,we present an efficient,grid-free numerical solver based on our representation formulas,which is shown to scale linearly with the state dimension,and thus,to overcome the curse of dimensionality.Using existing optimization methods and the min-plus technique,we extend our numerical solvers to address more general classes of convex and nonconvex initial costs.We demonstrate the capabilities of our numerical solvers using implementations on a central processing unit(CPU)and a field-programmable gate array(FPGA).In several cases,our FPGA implementation obtains over a 10 times speedup compared to the CPU,which demonstrates the promising performance boosts FPGAs can achieve.Our numerical results show that our solvers have the potential to serve as a building block for solving broader classes of high-dimensional optimal control problems in real-time.
文摘The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.
基金supported by the Natural Science Foundation of China under Grant Nos.11231010,11471223,11501586BCMIIS and Key Project of Beijing Municipal Educational Commission under Grant No.KZ201410028030
文摘This paper proposes a test procedure for testing the regression coefficients in high dimensional partially linear models based on the F-statistic. In the partially linear model, the authors first estimate the unknown nonlinear component by some nonparametric methods and then generalize the F-statistic to test the regression coefficients under some regular conditions. During this procedure, the estimation of the nonlinear component brings much challenge to explore the properties of generalized F-test. The authors obtain some asymptotic properties of the generalized F-test in more general cases,including the asymptotic normality and the power of this test with p/n ∈(0, 1) without normality assumption. The asymptotic result is general and by adding some constraint conditions we can obtain the similar conclusions in high dimensional linear models. Through simulation studies, the authors demonstrate good finite-sample performance of the proposed test in comparison with the theoretical results. The practical utility of our method is illustrated by a real data example.
基金Supported by the National Key Research and Development Program of China(2016YFC0100300)the National Natural Science Foundation of China(61402371,61771369)+1 种基金the Natural Science Basic Research Plan in Shaanxi Province of China(2017JM6008)the Fundamental Research Funds for the Central Universities of China(3102017zy032,3102018zy020)
文摘Three high dimensional spatial standardization algorithms are used for diffusion tensor image(DTI)registration,and seven kinds of methods are used to evaluate their performances.Firstly,the template used in this paper was obtained by spatial transformation of 16 subjects by means of tensor-based standardization.Then,high dimensional standardization algorithms for diffusion tensor images,including fractional anisotropy(FA)based diffeomorphic registration algorithm,FA based elastic registration algorithm and tensor-based registration algorithm,were performed.Finally,7 kinds of evaluation methods,including normalized standard deviation,dyadic coherence,diffusion cross-correlation,overlap of eigenvalue-eigenvector pairs,Euclidean distance of diffusion tensor,and Euclidean distance of the deviatoric tensor and deviatoric of tensors,were used to qualitatively compare and summarize the above standardization algorithms.Experimental results revealed that the high-dimensional tensor-based standardization algorithms perform well and can maintain the consistency of anatomical structures.
基金supported by the University of Chinese Academy of Sciences under Grant No.Y95401TXX2Beijing Natural Science Foundation under Grant No.Z190004Key Program of Joint Funds of the National Natural Science Foundation of China under Grant No.U19B2040。
文摘This paper considers tests for regression coefficients in high dimensional partially linear Models.The authors first use the B-spline method to estimate the unknown smooth function so that it could be linearly expressed.Then,the authors propose an empirical likelihood method to test regression coefficients.The authors derive the asymptotic chi-squared distribution with two degrees of freedom of the proposed test statistics under the null hypothesis.In addition,the method is extended to test with nuisance parameters.Simulations show that the proposed method have a good performance in control of type-I error rate and power.The proposed method is also employed to analyze a data of Skin Cutaneous Melanoma(SKCM).
基金supported by National Natural Science Foundation of China(Grant Nos.10971122,11101274 and 11322109)Scientific and Technological Projects of Shandong Province(Grant No.2009GG10001012)Excellent Young Scientist Foundation of Shandong Province(Grant No.BS2012SF025)
文摘Covariance matrix plays an important role in risk management, asset pricing, and portfolio allocation. Covariance matrix estimation becomes challenging when the dimensionality is comparable or much larger than the sample size. A widely used approach for reducing dimensionality is based on multi-factor models. Although it has been well studied and quite successful in many applications, the quality of the estimated covariance matrix is often degraded due to a nontrivial amount of missing data in the factor matrix for both technical and cost reasons. Since the factor matrix is only approximately low rank or even has full rank, existing matrix completion algorithms are not applicable. We consider a new matrix completion paradigm using the factor models directly and apply the alternating direction method of multipliers for the recovery. Numerical experiments show that the nuclear-norm matrix completion approaches are not suitable but our proposed models and algorithms are promising.
基金supported by National Natural Science Foundation of China (Grant Nos. 11071022, 11231010 and 11471223)Beijing Center for Mathematics and Information Interdisciplinary ScienceKey Project of Beijing Municipal Educational Commission (Grant No. KZ201410028030)
文摘This paper aims to develop a new robust U-type test for high dimensional regression coefficients using the estimated U-statistic of order two and refitted cross-validation error variance estimation. It is proved that the limiting null distribution of the proposed new test is normal under two kinds of ordinary models.We further study the local power of the proposed test and compare with other competitive tests for high dimensional data. The idea of refitted cross-validation approach is utilized to reduce the bias of sample variance in the estimation of the test statistic. Our theoretical results indicate that the proposed test can have even more substantial power gain than the test by Zhong and Chen(2011) when testing a hypothesis with outlying observations and heavy tailed distributions. We assess the finite-sample performance of the proposed test by examining its size and power via Monte Carlo studies. We also illustrate the application of the proposed test by an empirical analysis of a real data example.
基金supported by National Natural Science Foundation of China (Grant Nos. 11401383, 11301391 and 11271080)
文摘We propose a two-step variable selection procedure for censored quantile regression with high dimensional predictors. To account for censoring data in high dimensional case, we employ effective dimension reduction and the ideas of informative subset idea. Under some regularity conditions, we show that our procedure enjoys the model selection consistency. Simulation study and real data analysis are conducted to evaluate the finite sample performance of the proposed approach.
基金Supported by the National Natural Science Foundation of China(71631004, 72033008)National Science Foundation for Distinguished Young Scholars(71625001)Science Foundation of Ministry of Education of China(19YJA910003)。
文摘The era of big data brings opportunities and challenges to developing new statistical methods and models to evaluate social programs or economic policies or interventions. This paper provides a comprehensive review on some recent advances in statistical methodologies and models to evaluate programs with high-dimensional data. In particular, four kinds of methods for making valid statistical inferences for treatment effects in high dimensions are addressed. The first one is the so-called doubly robust type estimation, which models the outcome regression and propensity score functions simultaneously. The second one is the covariate balance method to construct the treatment effect estimators. The third one is the sufficient dimension reduction approach for causal inferences. The last one is the machine learning procedure directly or indirectly to make statistical inferences to treatment effect. In such a way, some of these methods and models are closely related to the de-biased Lasso type methods for the regression model with high dimensions in the statistical literature. Finally, some future research topics are also discussed.
基金Supported by the National Natural Science Foundation of China(Grant No.11601140,11401082,11701260)Program funded by Education Department of Liaoning Province(Grant No.LN2019Q15).
文摘We deal with the boundedness of solutions to a class of fully parabolic quasilinear repulsion chemotaxis systems{ut=∇・(ϕ(u)∇u)+∇・(ψ(u)∇v),(x,t)∈Ω×(0,T),vt=Δv−v+u,(x,t)∈Ω×(0,T),under homogeneous Neumann boundary conditions in a smooth bounded domainΩ⊂R^N(N≥3),where 0<ψ(u)≤K(u+1)^a,K1(s+1)^m≤ϕ(s)≤K2(s+1)^m withα,K,K1,K2>0 and m∈R.It is shown that ifα−m<4/N+2,then for any sufficiently smooth initial data,the classical solutions to the system are uniformly-in-time bounded.This extends the known result for the corresponding model with linear diffusion.
基金The first author is supported by National Natural Science Foundation of China (Grant No. 10826069) and China Postdoctoral Foundation (Grant No. 20090450902) the second author is supported by National Natural Science Foundation of China (Grant Nos. 10471156 and 10531040)
文摘In this paper, we consider the Cauchy problem for systems of quasi-linear wave equations with multiple propagation speeds in spatial dimensions n ≥ 4. The problem when the nonlinearities depend on both the unknown function and their derivatives is studied. Based on some Klainerman- Sideris type weighted estimates and space-time L2 estimates, the results that the almost global existence for space dimensions n = 4 and global existence for n≥ 5 of small amplitude solutions are presented.
文摘Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams.
基金support of National Natural Science Foundation of China through grant Nos.11872127,11832002 and 11732005,Fundamental Research Program of Shenzhen Municipality No.JCYJ20160608153749600 and the Project of Highlevel Innovative Team Building Plan for Beijing Municipal Colleges and Universities No.IDHT20180513 and the project of Qin Xin Talents Cultivation Program,Beijing Information Science&Technology University QXTCP A201901.
文摘Dynamic transient responses of rotating twisted plate under the air-blast loading and step loading respectively considering the geometric nonlinear relationships are investigated using classical shallow shell theory.By applying energy principle,a novel high dimensional nonlinear dynamic system of the rotating cantilever twisted plate is derived for the first time.The use of variable mode functions by polynomial functions according to the twist angles and geometric of the plate makes it more accurate to describe the dynamic system than that using the classic cantilever beam functions and the free-free beam functions.The comparison researches are carried out between the present results and other literatures to validate present model,formulation and computer process.Equations of motion describing the transient high dimensional nonlinear dynamic response are reduced to a four degree of freedom dynamic system which expressed by out-plane displacement.The effects of twisted angle,stagger angle,rotation speed,load intensity and viscous damping on nonlinear dynamic transient responses of the twisted plate have been investigated.It’s important to note that although the homogeneous and isotropic material is applied here,it might be helpful for laminated composite,functionally graded material as long as the equivalent material parameters are obtained.