The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o...The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.展开更多
Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is ext...Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is extremely high,so we introduce a hybrid filter-wrapper feature selection algorithm based on an improved equilibrium optimizer for constructing an emotion recognition system.The proposed algorithm implements multi-objective emotion recognition with the minimum number of selected features and maximum accuracy.First,we use the information gain and Fisher Score to sort the features extracted from signals.Then,we employ a multi-objective ranking method to evaluate these features and assign different importance to them.Features with high rankings have a large probability of being selected.Finally,we propose a repair strategy to address the problem of duplicate solutions in multi-objective feature selection,which can improve the diversity of solutions and avoid falling into local traps.Using random forest and K-nearest neighbor classifiers,four English speech emotion datasets are employed to test the proposed algorithm(MBEO)as well as other multi-objective emotion identification techniques.The results illustrate that it performs well in inverted generational distance,hypervolume,Pareto solutions,and execution time,and MBEO is appropriate for high-dimensional English SER.展开更多
The objective of reliability-based design optimization(RBDO)is to minimize the optimization objective while satisfying the corresponding reliability requirements.However,the nested loop characteristic reduces the effi...The objective of reliability-based design optimization(RBDO)is to minimize the optimization objective while satisfying the corresponding reliability requirements.However,the nested loop characteristic reduces the efficiency of RBDO algorithm,which hinders their application to high-dimensional engineering problems.To address these issues,this paper proposes an efficient decoupled RBDO method combining high dimensional model representation(HDMR)and the weight-point estimation method(WPEM).First,we decouple the RBDO model using HDMR and WPEM.Second,Lagrange interpolation is used to approximate a univariate function.Finally,based on the results of the first two steps,the original nested loop reliability optimization model is completely transformed into a deterministic design optimization model that can be solved by a series of mature constrained optimization methods without any additional calculations.Two numerical examples of a planar 10-bar structure and an aviation hydraulic piping system with 28 design variables are analyzed to illustrate the performance and practicability of the proposed method.展开更多
We,in this paper,investigate two-sample quantile difference by empirical likelihood method when the responses with high-dimensional covariates of the two populations are missing at random.In particular,based on suffic...We,in this paper,investigate two-sample quantile difference by empirical likelihood method when the responses with high-dimensional covariates of the two populations are missing at random.In particular,based on sufficient dimension reduction technique,we construct three empirical log-likelihood ratios for the quantile difference between two samples by using inverse probability weighting imputation,regression imputation as well as augmented inverse probability weighting imputation,respectively,and prove their asymptotic distributions.At the same time,we give a test to check whether two populations have the same distribution.A simulation study is carried out to investigate finite sample behavior of the proposed methods too.展开更多
As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected featu...As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected features.Evolutionary computing(EC)is promising for FS owing to its powerful search capability.However,in traditional EC-based methods,feature subsets are represented via a length-fixed individual encoding.It is ineffective for high-dimensional data,because it results in a huge search space and prohibitive training time.This work proposes a length-adaptive non-dominated sorting genetic algorithm(LA-NSGA)with a length-variable individual encoding and a length-adaptive evolution mechanism for bi-objective highdimensional FS.In LA-NSGA,an initialization method based on correlation and redundancy is devised to initialize individuals of diverse lengths,and a Pareto dominance-based length change operator is introduced to guide individuals to explore in promising search space adaptively.Moreover,a dominance-based local search method is employed for further improvement.The experimental results based on 12 high-dimensional gene datasets show that the Pareto front of feature subsets produced by LA-NSGA is superior to those of existing algorithms.展开更多
In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)...In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)feature extraction technique.First,dimensionality of the original imbalanced data is reduced using MDS so that distances between any two different samples are preserved as well as possible.Second,a novel OPCE algorithm is applied to classify imbalanced samples by placing optimised observation points in a low-dimensional data space.Third,optimization of the observation point mappings is carried out to obtain a reliable assessment of the unknown samples.Exhaustive experiments have been conducted to evaluate the feasibility,rationality,and effectiveness of the proposed OPCE algorithm using seven benchmark HDIC data sets.Experimental results show that(1)the OPCE algorithm can be trained faster on low-dimensional imbalanced data than on high-dimensional data;(2)the OPCE algorithm can correctly identify samples as the number of optimised observation points is increased;and(3)statistical analysis reveals that OPCE yields better HDIC performances on the selected data sets in comparison with eight other HDIC algorithms.This demonstrates that OPCE is a viable algorithm to deal with HDIC problems.展开更多
Biometric gait recognition is a lesser-known but emerging and effective biometric recognition method which enables subjects’walking patterns to be recognized.Existing research in this area has primarily focused on fe...Biometric gait recognition is a lesser-known but emerging and effective biometric recognition method which enables subjects’walking patterns to be recognized.Existing research in this area has primarily focused on feature analysis through the extraction of individual features,which captures most of the information but fails to capture subtle variations in gait dynamics.Therefore,a novel feature taxonomy and an approach for deriving a relationship between a function of one set of gait features with another set are introduced.The gait features extracted from body halves divided by anatomical planes on vertical,horizontal,and diagonal axes are grouped to form canonical gait covariates.Canonical Correlation Analysis is utilized to measure the strength of association between the canonical covariates of gait.Thus,gait assessment and identification are enhancedwhenmore semantic information is available through CCA-basedmulti-feature fusion.Hence,CarnegieMellon University’s 3D gait database,which contains 32 gait samples taken at different paces,is utilized in analyzing gait characteristics.The performance of Linear Discriminant Analysis,K-Nearest Neighbors,Naive Bayes,Artificial Neural Networks,and Support Vector Machines was improved by a 4%average when the CCA-utilized gait identification approachwas used.Asignificant maximumaccuracy rate of 97.8%was achieved throughCCA-based gait identification.Beyond that,the rate of false identifications and unrecognized gaits went down to half,demonstrating state-of-the-art for gait identification.展开更多
k-means is a popular clustering algorithm because of its simplicity and scalability to handle large datasets.However,one of its setbacks is the challenge of identifying the correct k-hyperparameter value.Tuning this v...k-means is a popular clustering algorithm because of its simplicity and scalability to handle large datasets.However,one of its setbacks is the challenge of identifying the correct k-hyperparameter value.Tuning this value correctly is critical for building effective k-means models.The use of the traditional elbow method to help identify this value has a long-standing literature.However,when using this method with certain datasets,smooth curves may appear,making it challenging to identify the k-value due to its unclear nature.On the other hand,various internal validation indexes,which are proposed as a solution to this issue,may be inconsistent.Although various techniques for solving smooth elbow challenges exist,k-hyperparameter tuning in high-dimensional spaces still remains intractable and an open research issue.In this paper,we have first reviewed the existing techniques for solving smooth elbow challenges.The identified research gaps are then utilized in the development of the new technique.The new technique,referred to as the ensemble-based technique of a self-adapting autoencoder and internal validation indexes,is then validated in high-dimensional space clustering.The optimal k-value,tuned by this technique using a voting scheme,is a trade-off between the number of clusters visualized in the autoencoder’s latent space,k-value from the ensemble internal validation index score and one that generates a value of 0 or close to 0 on the derivative f″′(k)(1+f′(k)^(2))−3 f″(k)^(2)f″((k)2f′(k),at the elbow.Experimental results based on the Cochran’s Q test,ANOVA,and McNemar’s score indicate a relatively good performance of the newly developed technique in k-hyperparameter tuning.展开更多
Spatial covariance matrix(SCM) is essential in many multi-antenna systems such as massive multiple-input multiple-output(MIMO). For multi-antenna systems operating at millimeter-wave bands, hybrid analog-digital struc...Spatial covariance matrix(SCM) is essential in many multi-antenna systems such as massive multiple-input multiple-output(MIMO). For multi-antenna systems operating at millimeter-wave bands, hybrid analog-digital structure has been widely adopted to reduce the cost of radio frequency chains.In this situation, signals received at the antennas are unavailable to the digital receiver, and as a consequence, traditional sample average approach cannot be used for SCM reconstruction in hybrid multi-antenna systems. To address this issue, beam sweeping algorithm(BSA) which can reconstruct the SCM effectively for a hybrid uniform linear array, has been proposed in our previous works. However, direct extension of BSA to a hybrid uniform circular array(UCA)will result in a huge computational burden. To this end, a low-complexity approach is proposed in this paper. By exploiting the symmetry features of SCM for the UCA, the number of unknowns can be reduced significantly and thus the complexity of reconstruction can be saved accordingly. Furthermore, an insightful analysis is also presented in this paper, showing that the reduction of the number of unknowns can also improve the accuracy of the reconstructed SCM. Simulation results are also shown to demonstrate the proposed approach.展开更多
This paper proposes linear and nonlinear filters for a non-Gaussian dynamic system with an unknown nominal covariance of the output noise.The challenge of designing a suitable filter in the presence of an unknown cova...This paper proposes linear and nonlinear filters for a non-Gaussian dynamic system with an unknown nominal covariance of the output noise.The challenge of designing a suitable filter in the presence of an unknown covariance matrix is addressed by focusing on the output data set of the system.Considering that data generated from a Gaussian distribution exhibit ellipsoidal scattering,we first propose the weighted sum of norms(SON)clustering method that prioritizes nearby points,reduces distant point influence,and lowers computational cost.Then,by introducing the weighted maximum likelihood,we propose a semi-definite program(SDP)to detect outliers and reduce their impacts on each cluster.Detecting these weights paves the way to obtain an appropriate covariance of the output noise.Next,two filtering approaches are presented:a cluster-based robust linear filter using the maximum a posterior(MAP)estimation and a clusterbased robust nonlinear filter assuming that output noise distribution stems from some Gaussian noise resources according to the ellipsoidal clusters.At last,simulation results demonstrate the effectiveness of our proposed filtering approaches.展开更多
The Internet of Things(IoT)is a growing technology that allows the sharing of data with other devices across wireless networks.Specifically,IoT systems are vulnerable to cyberattacks due to its opennes The proposed wo...The Internet of Things(IoT)is a growing technology that allows the sharing of data with other devices across wireless networks.Specifically,IoT systems are vulnerable to cyberattacks due to its opennes The proposed work intends to implement a new security framework for detecting the most specific and harmful intrusions in IoT networks.In this framework,a Covariance Linear Learning Embedding Selection(CL2ES)methodology is used at first to extract the features highly associated with the IoT intrusions.Then,the Kernel Distributed Bayes Classifier(KDBC)is created to forecast attacks based on the probability distribution value precisely.In addition,a unique Mongolian Gazellas Optimization(MGO)algorithm is used to optimize the weight value for the learning of the classifier.The effectiveness of the proposed CL2ES-KDBC framework has been assessed using several IoT cyber-attack datasets,The obtained results are then compared with current classification methods regarding accuracy(97%),precision(96.5%),and other factors.Computational analysis of the CL2ES-KDBC system on IoT intrusion datasets is performed,which provides valuable insight into its performance,efficiency,and suitability for securing IoT networks.展开更多
Environmental covariates are the basis of predictive soil mapping.Their selection determines the performance of soil mapping to a great extent,especially in cases where the number of soil samples is limited but soil s...Environmental covariates are the basis of predictive soil mapping.Their selection determines the performance of soil mapping to a great extent,especially in cases where the number of soil samples is limited but soil spatial heterogeneity is high.In this study,we proposed an integrated method to select environmental covariates for predictive soil depth mapping.First,candidate variables that may influence the development of soil depth were selected based on pedogenetic knowledge.Second,three conventional methods(Pearson correlation analysis(PsCA),generalized additive models(GAMs),and Random Forest(RF))were used to generate optimal combinations of environmental covariates.Finally,three optimal combinations were integrated to produce a final combination based on the importance and occurrence frequency of each environmental covariate.We tested this method for soil depth mapping in the upper reaches of the Heihe River Basin in Northwest China.A total of 129 soil sampling sites were collected using a representative sampling strategy,and RF and support vector machine(SVM)models were used to map soil depth.The results showed that compared to the set of environmental covariates selected by the three conventional selection methods,the set of environmental covariates selected by the proposed method achieved higher mapping accuracy.The combination from the proposed method obtained a root mean square error(RMSE)of 11.88 cm,which was 2.25–7.64 cm lower than the other methods,and an R^2 value of 0.76,which was 0.08–0.26 higher than the other methods.The results suggest that our method can be used as an alternative to the conventional methods for soil depth mapping and may also be effective for mapping other soil properties.展开更多
Guaranteed cost consensus analysis and design problems for high-dimensional multi-agent systems with time varying delays are investigated. The idea of guaranteed cost con trol is introduced into consensus problems for...Guaranteed cost consensus analysis and design problems for high-dimensional multi-agent systems with time varying delays are investigated. The idea of guaranteed cost con trol is introduced into consensus problems for high-dimensiona multi-agent systems with time-varying delays, where a cos function is defined based on state errors among neighboring agents and control inputs of all the agents. By the state space decomposition approach and the linear matrix inequality(LMI)sufficient conditions for guaranteed cost consensus and consensu alization are given. Moreover, a guaranteed cost upper bound o the cost function is determined. It should be mentioned that these LMI criteria are dependent on the change rate of time delays and the maximum time delay, the guaranteed cost upper bound is only dependent on the maximum time delay but independen of the Laplacian matrix. Finally, numerical simulations are given to demonstrate theoretical results.展开更多
We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-samp...We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like Minimum Covariance Determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection are concerned.展开更多
The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities...The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction.展开更多
Latent factor(LF) models are highly effective in extracting useful knowledge from High-Dimensional and Sparse(HiDS) matrices which are commonly seen in various industrial applications. An LF model usually adopts itera...Latent factor(LF) models are highly effective in extracting useful knowledge from High-Dimensional and Sparse(HiDS) matrices which are commonly seen in various industrial applications. An LF model usually adopts iterative optimizers,which may consume many iterations to achieve a local optima,resulting in considerable time cost. Hence, determining how to accelerate the training process for LF models has become a significant issue. To address this, this work proposes a randomized latent factor(RLF) model. It incorporates the principle of randomized learning techniques from neural networks into the LF analysis of HiDS matrices, thereby greatly alleviating computational burden. It also extends a standard learning process for randomized neural networks in context of LF analysis to make the resulting model represent an HiDS matrix correctly.Experimental results on three HiDS matrices from industrial applications demonstrate that compared with state-of-the-art LF models, RLF is able to achieve significantly higher computational efficiency and comparable prediction accuracy for missing data.I provides an important alternative approach to LF analysis of HiDS matrices, which is especially desired for industrial applications demanding highly efficient models.展开更多
Selecting a proper set of covariates is one of the most important factors that influence the accuracy of digital soil mapping(DSM).The statistical or machine learning methods for selecting DSM covariates are not avail...Selecting a proper set of covariates is one of the most important factors that influence the accuracy of digital soil mapping(DSM).The statistical or machine learning methods for selecting DSM covariates are not available for those situations with limited samples.To solve the problem,this paper proposed a case-based method which could formalize the covariate selection knowledge contained in practical DSM applications.The proposed method trained Random Forest(RF)classifiers with DSM cases extracted from the practical DSM applications and then used the trained classifiers to determine whether each one potential covariate should be used in a new DSM application.In this study,we took topographic covariates as examples of covariates and extracted 191 DSM cases from 56 peer-reviewed journal articles to evaluate the performance of the proposed case-based method by Leave-One-Out cross validation.Compared with a novices’commonly-used way of selecting DSM covariates,the proposed case-based method improved more than 30%accuracy according to three quantitative evaluation indices(i.e.,recall,precision,and F1-score).The proposed method could be also applied to selecting the proper set of covariates for other similar geographical modeling domains,such as landslide susceptibility mapping,and species distribution modeling.展开更多
Because all the known integrable models possess Schwarzian forms with Mobious transformation invariance,it may be one of the best ways to find new integrable models starting from some suitable Mobious transformation i...Because all the known integrable models possess Schwarzian forms with Mobious transformation invariance,it may be one of the best ways to find new integrable models starting from some suitable Mobious transformation invariant equations. In this paper, we study the Painlevé integrability of some special (3+1)-dimensional Schwarzian models.展开更多
This paper deals with the representation of the solutions of a polynomial system, and concentrates on the high-dimensional case. Based on the rational univari- ate representation of zero-dimensional polynomial systems...This paper deals with the representation of the solutions of a polynomial system, and concentrates on the high-dimensional case. Based on the rational univari- ate representation of zero-dimensional polynomial systems, we give a new description called rational representation for the solutions of a high-dimensional polynomial sys- tem and propose an algorithm for computing it. By this way all the solutions of any high-dimensional polynomial system can be represented by a set of so-called rational- representation sets.展开更多
Aimed at the issue that traditional clustering methods are not appropriate to high-dimensional data, a cuckoo search fuzzy-weighting algorithm for subspace clustering is presented on the basis of the exited soft subsp...Aimed at the issue that traditional clustering methods are not appropriate to high-dimensional data, a cuckoo search fuzzy-weighting algorithm for subspace clustering is presented on the basis of the exited soft subspace clustering algorithm. In the proposed algorithm, a novel objective function is firstly designed by considering the fuzzy weighting within-cluster compactness and the between-cluster separation, and loosening the constraints of dimension weight matrix. Then gradual membership and improved Cuckoo search, a global search strategy, are introduced to optimize the objective function and search subspace clusters, giving novel learning rules for clustering. At last, the performance of the proposed algorithm on the clustering analysis of various low and high dimensional datasets is experimentally compared with that of several competitive subspace clustering algorithms. Experimental studies demonstrate that the proposed algorithm can obtain better performance than most of the existing soft subspace clustering algorithms.展开更多
文摘The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.
文摘Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is extremely high,so we introduce a hybrid filter-wrapper feature selection algorithm based on an improved equilibrium optimizer for constructing an emotion recognition system.The proposed algorithm implements multi-objective emotion recognition with the minimum number of selected features and maximum accuracy.First,we use the information gain and Fisher Score to sort the features extracted from signals.Then,we employ a multi-objective ranking method to evaluate these features and assign different importance to them.Features with high rankings have a large probability of being selected.Finally,we propose a repair strategy to address the problem of duplicate solutions in multi-objective feature selection,which can improve the diversity of solutions and avoid falling into local traps.Using random forest and K-nearest neighbor classifiers,four English speech emotion datasets are employed to test the proposed algorithm(MBEO)as well as other multi-objective emotion identification techniques.The results illustrate that it performs well in inverted generational distance,hypervolume,Pareto solutions,and execution time,and MBEO is appropriate for high-dimensional English SER.
基金supported by the Innovation Fund Project of the Gansu Education Department(Grant No.2021B-099).
文摘The objective of reliability-based design optimization(RBDO)is to minimize the optimization objective while satisfying the corresponding reliability requirements.However,the nested loop characteristic reduces the efficiency of RBDO algorithm,which hinders their application to high-dimensional engineering problems.To address these issues,this paper proposes an efficient decoupled RBDO method combining high dimensional model representation(HDMR)and the weight-point estimation method(WPEM).First,we decouple the RBDO model using HDMR and WPEM.Second,Lagrange interpolation is used to approximate a univariate function.Finally,based on the results of the first two steps,the original nested loop reliability optimization model is completely transformed into a deterministic design optimization model that can be solved by a series of mature constrained optimization methods without any additional calculations.Two numerical examples of a planar 10-bar structure and an aviation hydraulic piping system with 28 design variables are analyzed to illustrate the performance and practicability of the proposed method.
基金Supported by National Natural Science Foundation of China(Grant No.12071348)National Social Science Foundation of China(Grant No.17BTJ032)。
文摘We,in this paper,investigate two-sample quantile difference by empirical likelihood method when the responses with high-dimensional covariates of the two populations are missing at random.In particular,based on sufficient dimension reduction technique,we construct three empirical log-likelihood ratios for the quantile difference between two samples by using inverse probability weighting imputation,regression imputation as well as augmented inverse probability weighting imputation,respectively,and prove their asymptotic distributions.At the same time,we give a test to check whether two populations have the same distribution.A simulation study is carried out to investigate finite sample behavior of the proposed methods too.
基金supported in part by the National Natural Science Foundation of China(62172065,62072060)。
文摘As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected features.Evolutionary computing(EC)is promising for FS owing to its powerful search capability.However,in traditional EC-based methods,feature subsets are represented via a length-fixed individual encoding.It is ineffective for high-dimensional data,because it results in a huge search space and prohibitive training time.This work proposes a length-adaptive non-dominated sorting genetic algorithm(LA-NSGA)with a length-variable individual encoding and a length-adaptive evolution mechanism for bi-objective highdimensional FS.In LA-NSGA,an initialization method based on correlation and redundancy is devised to initialize individuals of diverse lengths,and a Pareto dominance-based length change operator is introduced to guide individuals to explore in promising search space adaptively.Moreover,a dominance-based local search method is employed for further improvement.The experimental results based on 12 high-dimensional gene datasets show that the Pareto front of feature subsets produced by LA-NSGA is superior to those of existing algorithms.
基金National Natural Science Foundation of China,Grant/Award Number:61972261Basic Research Foundations of Shenzhen,Grant/Award Numbers:JCYJ20210324093609026,JCYJ20200813091134001。
文摘In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)feature extraction technique.First,dimensionality of the original imbalanced data is reduced using MDS so that distances between any two different samples are preserved as well as possible.Second,a novel OPCE algorithm is applied to classify imbalanced samples by placing optimised observation points in a low-dimensional data space.Third,optimization of the observation point mappings is carried out to obtain a reliable assessment of the unknown samples.Exhaustive experiments have been conducted to evaluate the feasibility,rationality,and effectiveness of the proposed OPCE algorithm using seven benchmark HDIC data sets.Experimental results show that(1)the OPCE algorithm can be trained faster on low-dimensional imbalanced data than on high-dimensional data;(2)the OPCE algorithm can correctly identify samples as the number of optimised observation points is increased;and(3)statistical analysis reveals that OPCE yields better HDIC performances on the selected data sets in comparison with eight other HDIC algorithms.This demonstrates that OPCE is a viable algorithm to deal with HDIC problems.
基金supported by Istanbul University Scientific Research Project Department with IRP-51706 Project Number.
文摘Biometric gait recognition is a lesser-known but emerging and effective biometric recognition method which enables subjects’walking patterns to be recognized.Existing research in this area has primarily focused on feature analysis through the extraction of individual features,which captures most of the information but fails to capture subtle variations in gait dynamics.Therefore,a novel feature taxonomy and an approach for deriving a relationship between a function of one set of gait features with another set are introduced.The gait features extracted from body halves divided by anatomical planes on vertical,horizontal,and diagonal axes are grouped to form canonical gait covariates.Canonical Correlation Analysis is utilized to measure the strength of association between the canonical covariates of gait.Thus,gait assessment and identification are enhancedwhenmore semantic information is available through CCA-basedmulti-feature fusion.Hence,CarnegieMellon University’s 3D gait database,which contains 32 gait samples taken at different paces,is utilized in analyzing gait characteristics.The performance of Linear Discriminant Analysis,K-Nearest Neighbors,Naive Bayes,Artificial Neural Networks,and Support Vector Machines was improved by a 4%average when the CCA-utilized gait identification approachwas used.Asignificant maximumaccuracy rate of 97.8%was achieved throughCCA-based gait identification.Beyond that,the rate of false identifications and unrecognized gaits went down to half,demonstrating state-of-the-art for gait identification.
文摘k-means is a popular clustering algorithm because of its simplicity and scalability to handle large datasets.However,one of its setbacks is the challenge of identifying the correct k-hyperparameter value.Tuning this value correctly is critical for building effective k-means models.The use of the traditional elbow method to help identify this value has a long-standing literature.However,when using this method with certain datasets,smooth curves may appear,making it challenging to identify the k-value due to its unclear nature.On the other hand,various internal validation indexes,which are proposed as a solution to this issue,may be inconsistent.Although various techniques for solving smooth elbow challenges exist,k-hyperparameter tuning in high-dimensional spaces still remains intractable and an open research issue.In this paper,we have first reviewed the existing techniques for solving smooth elbow challenges.The identified research gaps are then utilized in the development of the new technique.The new technique,referred to as the ensemble-based technique of a self-adapting autoencoder and internal validation indexes,is then validated in high-dimensional space clustering.The optimal k-value,tuned by this technique using a voting scheme,is a trade-off between the number of clusters visualized in the autoencoder’s latent space,k-value from the ensemble internal validation index score and one that generates a value of 0 or close to 0 on the derivative f″′(k)(1+f′(k)^(2))−3 f″(k)^(2)f″((k)2f′(k),at the elbow.Experimental results based on the Cochran’s Q test,ANOVA,and McNemar’s score indicate a relatively good performance of the newly developed technique in k-hyperparameter tuning.
基金supported by National Key Research and Development Program of China under Grant 2020YFB1804901State Key Laboratory of Rail Traffic Control and Safety(Contract:No.RCS2022ZT 015)Special Key Project of Technological Innovation and Application Development of Chongqing Science and Technology Bureau(cstc2019jscx-fxydX0053).
文摘Spatial covariance matrix(SCM) is essential in many multi-antenna systems such as massive multiple-input multiple-output(MIMO). For multi-antenna systems operating at millimeter-wave bands, hybrid analog-digital structure has been widely adopted to reduce the cost of radio frequency chains.In this situation, signals received at the antennas are unavailable to the digital receiver, and as a consequence, traditional sample average approach cannot be used for SCM reconstruction in hybrid multi-antenna systems. To address this issue, beam sweeping algorithm(BSA) which can reconstruct the SCM effectively for a hybrid uniform linear array, has been proposed in our previous works. However, direct extension of BSA to a hybrid uniform circular array(UCA)will result in a huge computational burden. To this end, a low-complexity approach is proposed in this paper. By exploiting the symmetry features of SCM for the UCA, the number of unknowns can be reduced significantly and thus the complexity of reconstruction can be saved accordingly. Furthermore, an insightful analysis is also presented in this paper, showing that the reduction of the number of unknowns can also improve the accuracy of the reconstructed SCM. Simulation results are also shown to demonstrate the proposed approach.
文摘This paper proposes linear and nonlinear filters for a non-Gaussian dynamic system with an unknown nominal covariance of the output noise.The challenge of designing a suitable filter in the presence of an unknown covariance matrix is addressed by focusing on the output data set of the system.Considering that data generated from a Gaussian distribution exhibit ellipsoidal scattering,we first propose the weighted sum of norms(SON)clustering method that prioritizes nearby points,reduces distant point influence,and lowers computational cost.Then,by introducing the weighted maximum likelihood,we propose a semi-definite program(SDP)to detect outliers and reduce their impacts on each cluster.Detecting these weights paves the way to obtain an appropriate covariance of the output noise.Next,two filtering approaches are presented:a cluster-based robust linear filter using the maximum a posterior(MAP)estimation and a clusterbased robust nonlinear filter assuming that output noise distribution stems from some Gaussian noise resources according to the ellipsoidal clusters.At last,simulation results demonstrate the effectiveness of our proposed filtering approaches.
文摘The Internet of Things(IoT)is a growing technology that allows the sharing of data with other devices across wireless networks.Specifically,IoT systems are vulnerable to cyberattacks due to its opennes The proposed work intends to implement a new security framework for detecting the most specific and harmful intrusions in IoT networks.In this framework,a Covariance Linear Learning Embedding Selection(CL2ES)methodology is used at first to extract the features highly associated with the IoT intrusions.Then,the Kernel Distributed Bayes Classifier(KDBC)is created to forecast attacks based on the probability distribution value precisely.In addition,a unique Mongolian Gazellas Optimization(MGO)algorithm is used to optimize the weight value for the learning of the classifier.The effectiveness of the proposed CL2ES-KDBC framework has been assessed using several IoT cyber-attack datasets,The obtained results are then compared with current classification methods regarding accuracy(97%),precision(96.5%),and other factors.Computational analysis of the CL2ES-KDBC system on IoT intrusion datasets is performed,which provides valuable insight into its performance,efficiency,and suitability for securing IoT networks.
基金supported financially by the National Natural Science Foundation of China (91325301, 41571212 and 41137224)the Project of "One-Three-Five" Strategic Planning & Frontier Sciences of the Institute of Soil Science, Chinese Academy of Sciences (ISSASIP1622)the National Key Basic Research Special Foundation of China (2012FY112100)
文摘Environmental covariates are the basis of predictive soil mapping.Their selection determines the performance of soil mapping to a great extent,especially in cases where the number of soil samples is limited but soil spatial heterogeneity is high.In this study,we proposed an integrated method to select environmental covariates for predictive soil depth mapping.First,candidate variables that may influence the development of soil depth were selected based on pedogenetic knowledge.Second,three conventional methods(Pearson correlation analysis(PsCA),generalized additive models(GAMs),and Random Forest(RF))were used to generate optimal combinations of environmental covariates.Finally,three optimal combinations were integrated to produce a final combination based on the importance and occurrence frequency of each environmental covariate.We tested this method for soil depth mapping in the upper reaches of the Heihe River Basin in Northwest China.A total of 129 soil sampling sites were collected using a representative sampling strategy,and RF and support vector machine(SVM)models were used to map soil depth.The results showed that compared to the set of environmental covariates selected by the three conventional selection methods,the set of environmental covariates selected by the proposed method achieved higher mapping accuracy.The combination from the proposed method obtained a root mean square error(RMSE)of 11.88 cm,which was 2.25–7.64 cm lower than the other methods,and an R^2 value of 0.76,which was 0.08–0.26 higher than the other methods.The results suggest that our method can be used as an alternative to the conventional methods for soil depth mapping and may also be effective for mapping other soil properties.
基金supported by Shaanxi Province Natural Science Foundation of Research Projects(2016JM6014)the Innovation Foundation of High-Tech Institute of Xi’an(2015ZZDJJ03)the Youth Foundation of HighTech Institute of Xi’an(2016QNJJ004)
文摘Guaranteed cost consensus analysis and design problems for high-dimensional multi-agent systems with time varying delays are investigated. The idea of guaranteed cost con trol is introduced into consensus problems for high-dimensiona multi-agent systems with time-varying delays, where a cos function is defined based on state errors among neighboring agents and control inputs of all the agents. By the state space decomposition approach and the linear matrix inequality(LMI)sufficient conditions for guaranteed cost consensus and consensu alization are given. Moreover, a guaranteed cost upper bound o the cost function is determined. It should be mentioned that these LMI criteria are dependent on the change rate of time delays and the maximum time delay, the guaranteed cost upper bound is only dependent on the maximum time delay but independen of the Laplacian matrix. Finally, numerical simulations are given to demonstrate theoretical results.
文摘We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like Minimum Covariance Determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection are concerned.
基金Supported by the National Natural Science Foundation of China(No.61502475)the Importation and Development of High-Caliber Talents Project of the Beijing Municipal Institutions(No.CIT&TCD201504039)
文摘The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction.
基金supported in part by the National Natural Science Foundation of China (6177249391646114)+1 种基金Chongqing research program of technology innovation and application (cstc2017rgzn-zdyfX0020)in part by the Pioneer Hundred Talents Program of Chinese Academy of Sciences
文摘Latent factor(LF) models are highly effective in extracting useful knowledge from High-Dimensional and Sparse(HiDS) matrices which are commonly seen in various industrial applications. An LF model usually adopts iterative optimizers,which may consume many iterations to achieve a local optima,resulting in considerable time cost. Hence, determining how to accelerate the training process for LF models has become a significant issue. To address this, this work proposes a randomized latent factor(RLF) model. It incorporates the principle of randomized learning techniques from neural networks into the LF analysis of HiDS matrices, thereby greatly alleviating computational burden. It also extends a standard learning process for randomized neural networks in context of LF analysis to make the resulting model represent an HiDS matrix correctly.Experimental results on three HiDS matrices from industrial applications demonstrate that compared with state-of-the-art LF models, RLF is able to achieve significantly higher computational efficiency and comparable prediction accuracy for missing data.I provides an important alternative approach to LF analysis of HiDS matrices, which is especially desired for industrial applications demanding highly efficient models.
基金supported by grants from the National Natural Science Foundation of China(41431177 and 41871300)the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD),China+4 种基金the Innovation Project of State Key Laboratory of Resources and Environmental Information System(LREIS),China(O88RA20CYA)the Outstanding Innovation Team in Colleges and Universities in Jiangsu Province,ChinaSupports to A-Xing Zhu through the Vilas Associate Awardthe Hammel Faculty Fellow Awardthe Manasse Chair Professorship from the University of Wisconsin-Madison。
文摘Selecting a proper set of covariates is one of the most important factors that influence the accuracy of digital soil mapping(DSM).The statistical or machine learning methods for selecting DSM covariates are not available for those situations with limited samples.To solve the problem,this paper proposed a case-based method which could formalize the covariate selection knowledge contained in practical DSM applications.The proposed method trained Random Forest(RF)classifiers with DSM cases extracted from the practical DSM applications and then used the trained classifiers to determine whether each one potential covariate should be used in a new DSM application.In this study,we took topographic covariates as examples of covariates and extracted 191 DSM cases from 56 peer-reviewed journal articles to evaluate the performance of the proposed case-based method by Leave-One-Out cross validation.Compared with a novices’commonly-used way of selecting DSM covariates,the proposed case-based method improved more than 30%accuracy according to three quantitative evaluation indices(i.e.,recall,precision,and F1-score).The proposed method could be also applied to selecting the proper set of covariates for other similar geographical modeling domains,such as landslide susceptibility mapping,and species distribution modeling.
文摘Because all the known integrable models possess Schwarzian forms with Mobious transformation invariance,it may be one of the best ways to find new integrable models starting from some suitable Mobious transformation invariant equations. In this paper, we study the Painlevé integrability of some special (3+1)-dimensional Schwarzian models.
基金The National Grand Fundamental Research 973 Program (2004CB318000) of China
文摘This paper deals with the representation of the solutions of a polynomial system, and concentrates on the high-dimensional case. Based on the rational univari- ate representation of zero-dimensional polynomial systems, we give a new description called rational representation for the solutions of a high-dimensional polynomial sys- tem and propose an algorithm for computing it. By this way all the solutions of any high-dimensional polynomial system can be represented by a set of so-called rational- representation sets.
基金supported in part by the National Natural Science Foundation of China (Nos. 61303074, 61309013)the Programs for Science, National Key Basic Research and Development Program ("973") of China (No. 2012CB315900)Technology Development of Henan province (Nos.12210231003, 13210231002)
文摘Aimed at the issue that traditional clustering methods are not appropriate to high-dimensional data, a cuckoo search fuzzy-weighting algorithm for subspace clustering is presented on the basis of the exited soft subspace clustering algorithm. In the proposed algorithm, a novel objective function is firstly designed by considering the fuzzy weighting within-cluster compactness and the between-cluster separation, and loosening the constraints of dimension weight matrix. Then gradual membership and improved Cuckoo search, a global search strategy, are introduced to optimize the objective function and search subspace clusters, giving novel learning rules for clustering. At last, the performance of the proposed algorithm on the clustering analysis of various low and high dimensional datasets is experimentally compared with that of several competitive subspace clustering algorithms. Experimental studies demonstrate that the proposed algorithm can obtain better performance than most of the existing soft subspace clustering algorithms.