An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sp...An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sparse Feature Vector', thus reduces the data scaleenormously, and can get the clustering result with only one data scan. Both theoretical analysis andempirical tests showed that CABOSFV is of low computational complexity. The algorithm findsclusters in high dimensional large datasets efficiently and handles noise effectively.展开更多
In this paper, the global controllability for a class of high dimensional polynomial systems has been investigated and a constructive algebraic criterion algorithm for their global controllability has been obtained. B...In this paper, the global controllability for a class of high dimensional polynomial systems has been investigated and a constructive algebraic criterion algorithm for their global controllability has been obtained. By the criterion algorithm, the global controllability can be determined in finite steps of arithmetic operations. The algorithm is imposed on the coefficients of the polynomials only and the analysis technique is based on Sturm Theorem in real algebraic geometry and its modern progress. Finally, the authors will give some examples to show the application of our results.展开更多
To solve the high-dimensionality issue and improve its accuracy in credit risk assessment,a high-dimensionality-trait-driven learning paradigm is proposed for feature extraction and classifier selection.The proposed p...To solve the high-dimensionality issue and improve its accuracy in credit risk assessment,a high-dimensionality-trait-driven learning paradigm is proposed for feature extraction and classifier selection.The proposed paradigm consists of three main stages:categorization of high dimensional data,high-dimensionality-trait-driven feature extraction,and high-dimensionality-trait-driven classifier selection.In the first stage,according to the definition of high-dimensionality and the relationship between sample size and feature dimensions,the high-dimensionality traits of credit dataset are further categorized into two types:100<feature dimensions<sample size,and feature dimensions≥sample size.In the second stage,some typical feature extraction methods are tested regarding the two categories of high dimensionality.In the final stage,four types of classifiers are performed to evaluate credit risk considering different high-dimensionality traits.For the purpose of illustration and verification,credit classification experiments are performed on two publicly available credit risk datasets,and the results show that the proposed high-dimensionality-trait-driven learning paradigm for feature extraction and classifier selection is effective in handling high-dimensional credit classification issues and improving credit classification accuracy relative to the benchmark models listed in this study.展开更多
Markowitz Portfolio theory under-estimates the risk associated with the return of a portfolio in case of high dimensional data. El Karoui mathematically proved this in [1] and suggested improved estimators for unbiase...Markowitz Portfolio theory under-estimates the risk associated with the return of a portfolio in case of high dimensional data. El Karoui mathematically proved this in [1] and suggested improved estimators for unbiased estimation of this risk under specific model assumptions. Norm constrained portfolios have recently been studied to keep the effective dimension low. In this paper we consider three sets of high dimensional data, the stock market prices for three countries, namely US, UK and India. We compare the Markowitz efficient frontier to those obtained by unbiasedness corrections and imposing norm-constraints in these real data scenarios. We also study the out-of-sample performance of the different procedures. We find that the 2-norm constrained portfolio has best overall performance.展开更多
Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data ...Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667.展开更多
This paper mainly concerns oblique derivative problems for nonlinear nondivergent elliptic equations of second order with measurable coefficients in a multiply connected domain. Under certain condition, we derive a pr...This paper mainly concerns oblique derivative problems for nonlinear nondivergent elliptic equations of second order with measurable coefficients in a multiply connected domain. Under certain condition, we derive a priori estimates of solutions. By using these estimates and the fixed-point theorem, we prove the existence of solutions.展开更多
With the abundance of exceptionally High Dimensional data, feature selection has become an essential element in the Data Mining process. In this paper, we investigate the problem of efficient feature selection for cla...With the abundance of exceptionally High Dimensional data, feature selection has become an essential element in the Data Mining process. In this paper, we investigate the problem of efficient feature selection for classification on High Dimensional datasets. We present a novel filter based approach for feature selection that sorts out the features based on a score and then we measure the performance of four different Data Mining classification algorithms on the resulting data. In the proposed approach, we partition the sorted feature and search the important feature in forward manner as well as in reversed manner, while starting from first and last feature simultaneously in the sorted list. The proposed approach is highly scalable and effective as it parallelizes over both attribute and tuples simultaneously allowing us to evaluate many of potential features for High Dimensional datasets. The newly proposed framework for feature selection is experimentally shown to be very valuable with real and synthetic High Dimensional datasets which improve the precision of selected features. We have also tested it to measure classification accuracy against various feature selection process.展开更多
The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o...The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.展开更多
Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approac...Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams.展开更多
In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified fro...In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified from the point of view of introducing Jensen-Shannon divergence to measure the importance of covariates. The idea of the method is to calculate the Jensen-Shannon divergence between the conditional probability distribution of the covariates on a given response variable and the unconditional probability distribution of the covariates, and then use the probabilities of the response variables as weights to calculate the weighted Jensen-Shannon divergence, where a larger weighted Jensen-Shannon divergence means that the covariates are more important. Additionally, we also investigated an adapted version of the method, which is to measure the relationship between the covariates and the response variable using the weighted Jensen-Shannon divergence adjusted by the logarithmic factor of the number of categories when the number of categories in each covariate varies. Then, through both theoretical and simulation experiments, it was demonstrated that the proposed methods have sure screening and ranking consistency properties. Finally, the results from simulation and real-dataset experiments show that in feature screening, the proposed methods investigated are robust in performance and faster in computational speed compared with an existing method.展开更多
Three high dimensional spatial standardization algorithms are used for diffusion tensor image(DTI)registration,and seven kinds of methods are used to evaluate their performances.Firstly,the template used in this paper...Three high dimensional spatial standardization algorithms are used for diffusion tensor image(DTI)registration,and seven kinds of methods are used to evaluate their performances.Firstly,the template used in this paper was obtained by spatial transformation of 16 subjects by means of tensor-based standardization.Then,high dimensional standardization algorithms for diffusion tensor images,including fractional anisotropy(FA)based diffeomorphic registration algorithm,FA based elastic registration algorithm and tensor-based registration algorithm,were performed.Finally,7 kinds of evaluation methods,including normalized standard deviation,dyadic coherence,diffusion cross-correlation,overlap of eigenvalue-eigenvector pairs,Euclidean distance of diffusion tensor,and Euclidean distance of the deviatoric tensor and deviatoric of tensors,were used to qualitatively compare and summarize the above standardization algorithms.Experimental results revealed that the high-dimensional tensor-based standardization algorithms perform well and can maintain the consistency of anatomical structures.展开更多
This paper considers tests for regression coefficients in high dimensional partially linear Models.The authors first use the B-spline method to estimate the unknown smooth function so that it could be linearly express...This paper considers tests for regression coefficients in high dimensional partially linear Models.The authors first use the B-spline method to estimate the unknown smooth function so that it could be linearly expressed.Then,the authors propose an empirical likelihood method to test regression coefficients.The authors derive the asymptotic chi-squared distribution with two degrees of freedom of the proposed test statistics under the null hypothesis.In addition,the method is extended to test with nuisance parameters.Simulations show that the proposed method have a good performance in control of type-I error rate and power.The proposed method is also employed to analyze a data of Skin Cutaneous Melanoma(SKCM).展开更多
The era of big data brings opportunities and challenges to developing new statistical methods and models to evaluate social programs or economic policies or interventions. This paper provides a comprehensive review on...The era of big data brings opportunities and challenges to developing new statistical methods and models to evaluate social programs or economic policies or interventions. This paper provides a comprehensive review on some recent advances in statistical methodologies and models to evaluate programs with high-dimensional data. In particular, four kinds of methods for making valid statistical inferences for treatment effects in high dimensions are addressed. The first one is the so-called doubly robust type estimation, which models the outcome regression and propensity score functions simultaneously. The second one is the covariate balance method to construct the treatment effect estimators. The third one is the sufficient dimension reduction approach for causal inferences. The last one is the machine learning procedure directly or indirectly to make statistical inferences to treatment effect. In such a way, some of these methods and models are closely related to the de-biased Lasso type methods for the regression model with high dimensions in the statistical literature. Finally, some future research topics are also discussed.展开更多
We deal with the boundedness of solutions to a class of fully parabolic quasilinear repulsion chemotaxis systems{ut=∇・(ϕ(u)∇u)+∇・(ψ(u)∇v),(x,t)∈Ω×(0,T),vt=Δv−v+u,(x,t)∈Ω×(0,T),under homogeneous Neumann...We deal with the boundedness of solutions to a class of fully parabolic quasilinear repulsion chemotaxis systems{ut=∇・(ϕ(u)∇u)+∇・(ψ(u)∇v),(x,t)∈Ω×(0,T),vt=Δv−v+u,(x,t)∈Ω×(0,T),under homogeneous Neumann boundary conditions in a smooth bounded domainΩ⊂R^N(N≥3),where 0<ψ(u)≤K(u+1)^a,K1(s+1)^m≤ϕ(s)≤K2(s+1)^m withα,K,K1,K2>0 and m∈R.It is shown that ifα−m<4/N+2,then for any sufficiently smooth initial data,the classical solutions to the system are uniformly-in-time bounded.This extends the known result for the corresponding model with linear diffusion.展开更多
In this paper, we consider the Cauchy problem for systems of quasi-linear wave equations with multiple propagation speeds in spatial dimensions n ≥ 4. The problem when the nonlinearities depend on both the unknown fu...In this paper, we consider the Cauchy problem for systems of quasi-linear wave equations with multiple propagation speeds in spatial dimensions n ≥ 4. The problem when the nonlinearities depend on both the unknown function and their derivatives is studied. Based on some Klainerman- Sideris type weighted estimates and space-time L2 estimates, the results that the almost global existence for space dimensions n = 4 and global existence for n≥ 5 of small amplitude solutions are presented.展开更多
Dynamic transient responses of rotating twisted plate under the air-blast loading and step loading respectively considering the geometric nonlinear relationships are investigated using classical shallow shell theory.B...Dynamic transient responses of rotating twisted plate under the air-blast loading and step loading respectively considering the geometric nonlinear relationships are investigated using classical shallow shell theory.By applying energy principle,a novel high dimensional nonlinear dynamic system of the rotating cantilever twisted plate is derived for the first time.The use of variable mode functions by polynomial functions according to the twist angles and geometric of the plate makes it more accurate to describe the dynamic system than that using the classic cantilever beam functions and the free-free beam functions.The comparison researches are carried out between the present results and other literatures to validate present model,formulation and computer process.Equations of motion describing the transient high dimensional nonlinear dynamic response are reduced to a four degree of freedom dynamic system which expressed by out-plane displacement.The effects of twisted angle,stagger angle,rotation speed,load intensity and viscous damping on nonlinear dynamic transient responses of the twisted plate have been investigated.It’s important to note that although the homogeneous and isotropic material is applied here,it might be helpful for laminated composite,functionally graded material as long as the equivalent material parameters are obtained.展开更多
The multi-mode approximation is presented to compute the interior wave function of Schr¨odinger equation.This idea is necessary to handle the multi-barrier and high dimensional resonant tunneling problems where m...The multi-mode approximation is presented to compute the interior wave function of Schr¨odinger equation.This idea is necessary to handle the multi-barrier and high dimensional resonant tunneling problems where multiple eigenvalues are considered.The accuracy and efficiency of this algorithm is demonstrated via several numerical examples.展开更多
The query space of a similarity query is usually narrowed down by pruning inactive query subspaces which contain no query results and keeping active query subspaces which may contain objects corre- sponding to the req...The query space of a similarity query is usually narrowed down by pruning inactive query subspaces which contain no query results and keeping active query subspaces which may contain objects corre- sponding to the request. However, some active query subspaces may contain no query results at all, those are called false active query subspaces. It is obvious that the performance of query processing degrades in the presence of false active query subspaces. Our experiments show that this problem becomes seriously when the data are high dimensional and the number of accesses to false active subspaces increases as the dimensionality increases. In order to solve this problem, this paper proposes a space mapping approach to reducing such unnecessary accesses. A given query space can be refined by filtering within its mapped space. To do so, a mapping strategy called maxgap is proposed to improve the efficiency of the refinement processing. Based on the mapping strategy, an index structure called MS-tree and algorithms of query processing are presented in this paper. Finally, the performance of MS-tree is compared with that of other competitors in terms of range queries on a real data set.展开更多
In this paper a high-dimension multiparty quantum secret sharing scheme is proposed by using Einstein-Podolsky-Rosen pairs and local unitary operators. This scheme has the advantage of not only having higher capacity,...In this paper a high-dimension multiparty quantum secret sharing scheme is proposed by using Einstein-Podolsky-Rosen pairs and local unitary operators. This scheme has the advantage of not only having higher capacity, but also saving storage space. The security analysis is also given.展开更多
Feature screening plays an important role in ultrahigh dimensional data analysis.This paper is concerned with conditional feature screening when one is interested in detecting the association between the response and ...Feature screening plays an important role in ultrahigh dimensional data analysis.This paper is concerned with conditional feature screening when one is interested in detecting the association between the response and ultrahigh dimensional predictors(e.g.,genetic makers)given a low-dimensional exposure variable(such as clinical variables or environmental variables).To this end,we first propose a new index to measure conditional independence,and further develop a conditional screening procedure based on the newly proposed index.We systematically study the theoretical property of the proposed procedure and establish the sure screening and ranking consistency properties under some very mild conditions.The newly proposed screening procedure enjoys some appealing properties.(a)It is model-free in that its implementation does not require a specification on the model structure;(b)it is robust to heavy-tailed distributions or outliers in both directions of response and predictors;and(c)it can deal with both feature screening and the conditional screening in a unified way.We study the finite sample performance of the proposed procedure by Monte Carlo simulations and further illustrate the proposed method through two real data examples.展开更多
文摘An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sparse Feature Vector', thus reduces the data scaleenormously, and can get the clustering result with only one data scan. Both theoretical analysis andempirical tests showed that CABOSFV is of low computational complexity. The algorithm findsclusters in high dimensional large datasets efficiently and handles noise effectively.
基金supported by the Natural Science Foundation of China under Grant Nos.60804008,61174048and 11071263the Fundamental Research Funds for the Central Universities and Guangdong Province Key Laboratory of Computational Science at Sun Yat-Sen University
文摘In this paper, the global controllability for a class of high dimensional polynomial systems has been investigated and a constructive algebraic criterion algorithm for their global controllability has been obtained. By the criterion algorithm, the global controllability can be determined in finite steps of arithmetic operations. The algorithm is imposed on the coefficients of the polynomials only and the analysis technique is based on Sturm Theorem in real algebraic geometry and its modern progress. Finally, the authors will give some examples to show the application of our results.
基金This work is partially supported by grants from the Key Program of National Natural Science Foundation of China(NSFC Nos.71631005 and 71731009)the Major Program of the National Social Science Foundation of China(No.19ZDA103).
文摘To solve the high-dimensionality issue and improve its accuracy in credit risk assessment,a high-dimensionality-trait-driven learning paradigm is proposed for feature extraction and classifier selection.The proposed paradigm consists of three main stages:categorization of high dimensional data,high-dimensionality-trait-driven feature extraction,and high-dimensionality-trait-driven classifier selection.In the first stage,according to the definition of high-dimensionality and the relationship between sample size and feature dimensions,the high-dimensionality traits of credit dataset are further categorized into two types:100<feature dimensions<sample size,and feature dimensions≥sample size.In the second stage,some typical feature extraction methods are tested regarding the two categories of high dimensionality.In the final stage,four types of classifiers are performed to evaluate credit risk considering different high-dimensionality traits.For the purpose of illustration and verification,credit classification experiments are performed on two publicly available credit risk datasets,and the results show that the proposed high-dimensionality-trait-driven learning paradigm for feature extraction and classifier selection is effective in handling high-dimensional credit classification issues and improving credit classification accuracy relative to the benchmark models listed in this study.
文摘Markowitz Portfolio theory under-estimates the risk associated with the return of a portfolio in case of high dimensional data. El Karoui mathematically proved this in [1] and suggested improved estimators for unbiased estimation of this risk under specific model assumptions. Norm constrained portfolios have recently been studied to keep the effective dimension low. In this paper we consider three sets of high dimensional data, the stock market prices for three countries, namely US, UK and India. We compare the Markowitz efficient frontier to those obtained by unbiasedness corrections and imposing norm-constraints in these real data scenarios. We also study the out-of-sample performance of the different procedures. We find that the 2-norm constrained portfolio has best overall performance.
基金Project(RDF 11-02-03)supported by the Research Development Fund of XJTLU,China
文摘Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667.
文摘This paper mainly concerns oblique derivative problems for nonlinear nondivergent elliptic equations of second order with measurable coefficients in a multiply connected domain. Under certain condition, we derive a priori estimates of solutions. By using these estimates and the fixed-point theorem, we prove the existence of solutions.
文摘With the abundance of exceptionally High Dimensional data, feature selection has become an essential element in the Data Mining process. In this paper, we investigate the problem of efficient feature selection for classification on High Dimensional datasets. We present a novel filter based approach for feature selection that sorts out the features based on a score and then we measure the performance of four different Data Mining classification algorithms on the resulting data. In the proposed approach, we partition the sorted feature and search the important feature in forward manner as well as in reversed manner, while starting from first and last feature simultaneously in the sorted list. The proposed approach is highly scalable and effective as it parallelizes over both attribute and tuples simultaneously allowing us to evaluate many of potential features for High Dimensional datasets. The newly proposed framework for feature selection is experimentally shown to be very valuable with real and synthetic High Dimensional datasets which improve the precision of selected features. We have also tested it to measure classification accuracy against various feature selection process.
文摘The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.
文摘Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams.
文摘In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified from the point of view of introducing Jensen-Shannon divergence to measure the importance of covariates. The idea of the method is to calculate the Jensen-Shannon divergence between the conditional probability distribution of the covariates on a given response variable and the unconditional probability distribution of the covariates, and then use the probabilities of the response variables as weights to calculate the weighted Jensen-Shannon divergence, where a larger weighted Jensen-Shannon divergence means that the covariates are more important. Additionally, we also investigated an adapted version of the method, which is to measure the relationship between the covariates and the response variable using the weighted Jensen-Shannon divergence adjusted by the logarithmic factor of the number of categories when the number of categories in each covariate varies. Then, through both theoretical and simulation experiments, it was demonstrated that the proposed methods have sure screening and ranking consistency properties. Finally, the results from simulation and real-dataset experiments show that in feature screening, the proposed methods investigated are robust in performance and faster in computational speed compared with an existing method.
基金Supported by the National Key Research and Development Program of China(2016YFC0100300)the National Natural Science Foundation of China(61402371,61771369)+1 种基金the Natural Science Basic Research Plan in Shaanxi Province of China(2017JM6008)the Fundamental Research Funds for the Central Universities of China(3102017zy032,3102018zy020)
文摘Three high dimensional spatial standardization algorithms are used for diffusion tensor image(DTI)registration,and seven kinds of methods are used to evaluate their performances.Firstly,the template used in this paper was obtained by spatial transformation of 16 subjects by means of tensor-based standardization.Then,high dimensional standardization algorithms for diffusion tensor images,including fractional anisotropy(FA)based diffeomorphic registration algorithm,FA based elastic registration algorithm and tensor-based registration algorithm,were performed.Finally,7 kinds of evaluation methods,including normalized standard deviation,dyadic coherence,diffusion cross-correlation,overlap of eigenvalue-eigenvector pairs,Euclidean distance of diffusion tensor,and Euclidean distance of the deviatoric tensor and deviatoric of tensors,were used to qualitatively compare and summarize the above standardization algorithms.Experimental results revealed that the high-dimensional tensor-based standardization algorithms perform well and can maintain the consistency of anatomical structures.
基金supported by the University of Chinese Academy of Sciences under Grant No.Y95401TXX2Beijing Natural Science Foundation under Grant No.Z190004Key Program of Joint Funds of the National Natural Science Foundation of China under Grant No.U19B2040。
文摘This paper considers tests for regression coefficients in high dimensional partially linear Models.The authors first use the B-spline method to estimate the unknown smooth function so that it could be linearly expressed.Then,the authors propose an empirical likelihood method to test regression coefficients.The authors derive the asymptotic chi-squared distribution with two degrees of freedom of the proposed test statistics under the null hypothesis.In addition,the method is extended to test with nuisance parameters.Simulations show that the proposed method have a good performance in control of type-I error rate and power.The proposed method is also employed to analyze a data of Skin Cutaneous Melanoma(SKCM).
基金Supported by the National Natural Science Foundation of China(71631004, 72033008)National Science Foundation for Distinguished Young Scholars(71625001)Science Foundation of Ministry of Education of China(19YJA910003)。
文摘The era of big data brings opportunities and challenges to developing new statistical methods and models to evaluate social programs or economic policies or interventions. This paper provides a comprehensive review on some recent advances in statistical methodologies and models to evaluate programs with high-dimensional data. In particular, four kinds of methods for making valid statistical inferences for treatment effects in high dimensions are addressed. The first one is the so-called doubly robust type estimation, which models the outcome regression and propensity score functions simultaneously. The second one is the covariate balance method to construct the treatment effect estimators. The third one is the sufficient dimension reduction approach for causal inferences. The last one is the machine learning procedure directly or indirectly to make statistical inferences to treatment effect. In such a way, some of these methods and models are closely related to the de-biased Lasso type methods for the regression model with high dimensions in the statistical literature. Finally, some future research topics are also discussed.
基金Supported by the National Natural Science Foundation of China(Grant No.11601140,11401082,11701260)Program funded by Education Department of Liaoning Province(Grant No.LN2019Q15).
文摘We deal with the boundedness of solutions to a class of fully parabolic quasilinear repulsion chemotaxis systems{ut=∇・(ϕ(u)∇u)+∇・(ψ(u)∇v),(x,t)∈Ω×(0,T),vt=Δv−v+u,(x,t)∈Ω×(0,T),under homogeneous Neumann boundary conditions in a smooth bounded domainΩ⊂R^N(N≥3),where 0<ψ(u)≤K(u+1)^a,K1(s+1)^m≤ϕ(s)≤K2(s+1)^m withα,K,K1,K2>0 and m∈R.It is shown that ifα−m<4/N+2,then for any sufficiently smooth initial data,the classical solutions to the system are uniformly-in-time bounded.This extends the known result for the corresponding model with linear diffusion.
基金The first author is supported by National Natural Science Foundation of China (Grant No. 10826069) and China Postdoctoral Foundation (Grant No. 20090450902) the second author is supported by National Natural Science Foundation of China (Grant Nos. 10471156 and 10531040)
文摘In this paper, we consider the Cauchy problem for systems of quasi-linear wave equations with multiple propagation speeds in spatial dimensions n ≥ 4. The problem when the nonlinearities depend on both the unknown function and their derivatives is studied. Based on some Klainerman- Sideris type weighted estimates and space-time L2 estimates, the results that the almost global existence for space dimensions n = 4 and global existence for n≥ 5 of small amplitude solutions are presented.
基金support of National Natural Science Foundation of China through grant Nos.11872127,11832002 and 11732005,Fundamental Research Program of Shenzhen Municipality No.JCYJ20160608153749600 and the Project of Highlevel Innovative Team Building Plan for Beijing Municipal Colleges and Universities No.IDHT20180513 and the project of Qin Xin Talents Cultivation Program,Beijing Information Science&Technology University QXTCP A201901.
文摘Dynamic transient responses of rotating twisted plate under the air-blast loading and step loading respectively considering the geometric nonlinear relationships are investigated using classical shallow shell theory.By applying energy principle,a novel high dimensional nonlinear dynamic system of the rotating cantilever twisted plate is derived for the first time.The use of variable mode functions by polynomial functions according to the twist angles and geometric of the plate makes it more accurate to describe the dynamic system than that using the classic cantilever beam functions and the free-free beam functions.The comparison researches are carried out between the present results and other literatures to validate present model,formulation and computer process.Equations of motion describing the transient high dimensional nonlinear dynamic response are reduced to a four degree of freedom dynamic system which expressed by out-plane displacement.The effects of twisted angle,stagger angle,rotation speed,load intensity and viscous damping on nonlinear dynamic transient responses of the twisted plate have been investigated.It’s important to note that although the homogeneous and isotropic material is applied here,it might be helpful for laminated composite,functionally graded material as long as the equivalent material parameters are obtained.
基金supported by the Conseil regional Midi Pyrenees(http://www.midipyrenees.fr/)entitled"Methodes Numeriques Multi-echelles pour le transport quantique"and by the ANR Project No.BLAN07-2212988 entitled"QUATRAIN")support from NSFC Projects 11071139 and NSFC Projects 10971115.
文摘The multi-mode approximation is presented to compute the interior wave function of Schr¨odinger equation.This idea is necessary to handle the multi-barrier and high dimensional resonant tunneling problems where multiple eigenvalues are considered.The accuracy and efficiency of this algorithm is demonstrated via several numerical examples.
基金Supported by National Basic Research Program of China (Grant No.2006CB303103)the National Natural Science Foundation of China (Grant Nos.60873011,60802026,60773219,60773021)the High Technology Program (Grant No.2007AA01Z192)
文摘The query space of a similarity query is usually narrowed down by pruning inactive query subspaces which contain no query results and keeping active query subspaces which may contain objects corre- sponding to the request. However, some active query subspaces may contain no query results at all, those are called false active query subspaces. It is obvious that the performance of query processing degrades in the presence of false active query subspaces. Our experiments show that this problem becomes seriously when the data are high dimensional and the number of accesses to false active subspaces increases as the dimensionality increases. In order to solve this problem, this paper proposes a space mapping approach to reducing such unnecessary accesses. A given query space can be refined by filtering within its mapped space. To do so, a mapping strategy called maxgap is proposed to improve the efficiency of the refinement processing. Based on the mapping strategy, an index structure called MS-tree and algorithms of query processing are presented in this paper. Finally, the performance of MS-tree is compared with that of other competitors in terms of range queries on a real data set.
基金Project supported by the National Fundamental Research Program (Grant No 001CB309308), China National Natural Science Foundation (Grant Nos 60433050, 10325521, 10447106), the Hang-Tian Science Fund, the SRFDP program of Education Ministry of China and Beijing Education Committee (Grant No XK100270454).
文摘In this paper a high-dimension multiparty quantum secret sharing scheme is proposed by using Einstein-Podolsky-Rosen pairs and local unitary operators. This scheme has the advantage of not only having higher capacity, but also saving storage space. The security analysis is also given.
基金supported by National Science Foundation of USA (Grant No. P50 DA039838)the Program of China Scholarships Council (Grant No. 201506040130)+6 种基金 National Natural Science Foundation of China (Grant No. 11401497)the Scientific Research Foundation for the Returned Overseas Chinese ScholarsState Education Ministry, the National Key Basic Research Development Program of China (Grant No. 2010CB950703)the Fundamental Research Funds for the Central UniversitiesNational Institute on Drug AbuseNational Institutes of Health (Grants Nos. P50 DA036107 and P50 DA039838)National Science Foundation of USA (Grant No. DMS 1512422)
文摘Feature screening plays an important role in ultrahigh dimensional data analysis.This paper is concerned with conditional feature screening when one is interested in detecting the association between the response and ultrahigh dimensional predictors(e.g.,genetic makers)given a low-dimensional exposure variable(such as clinical variables or environmental variables).To this end,we first propose a new index to measure conditional independence,and further develop a conditional screening procedure based on the newly proposed index.We systematically study the theoretical property of the proposed procedure and establish the sure screening and ranking consistency properties under some very mild conditions.The newly proposed screening procedure enjoys some appealing properties.(a)It is model-free in that its implementation does not require a specification on the model structure;(b)it is robust to heavy-tailed distributions or outliers in both directions of response and predictors;and(c)it can deal with both feature screening and the conditional screening in a unified way.We study the finite sample performance of the proposed procedure by Monte Carlo simulations and further illustrate the proposed method through two real data examples.