In this paper, we proposed a new semi-supervised multi-manifold learning method, called semi- supervised sparse multi-manifold embedding (S3MME), for dimensionality reduction of hyperspectral image data. S3MME exploit...In this paper, we proposed a new semi-supervised multi-manifold learning method, called semi- supervised sparse multi-manifold embedding (S3MME), for dimensionality reduction of hyperspectral image data. S3MME exploits both the labeled and unlabeled data to adaptively find neighbors of each sample from the same manifold by using an optimization program based on sparse representation, and naturally gives relative importance to the labeled ones through a graph-based methodology. Then it tries to extract discriminative features on each manifold such that the data points in the same manifold become closer. The effectiveness of the proposed multi-manifold learning algorithm is demonstrated and compared through experiments on a real hyperspectral images.展开更多
Multi-label data with high dimensionality often occurs,which will produce large time and energy overheads when directly used in classification tasks.To solve this problem,a novel algorithm called multi-label dimension...Multi-label data with high dimensionality often occurs,which will produce large time and energy overheads when directly used in classification tasks.To solve this problem,a novel algorithm called multi-label dimensionality reduction via semi-supervised discriminant analysis(MSDA) was proposed.It was expected to derive an objective discriminant function as smooth as possible on the data manifold by multi-label learning and semi-supervised learning.By virtue of the latent imformation,which was provided by the graph weighted matrix of sample attributes and the similarity correlation matrix of partial sample labels,MSDA readily made the separability between different classes achieve maximization and estimated the intrinsic geometric structure in the lower manifold space by employing unlabeled data.Extensive experimental results on several real multi-label datasets show that after dimensionality reduction using MSDA,the average classification accuracy is about 9.71% higher than that of other algorithms,and several evaluation metrices like Hamming-loss are also superior to those of other dimensionality reduction methods.展开更多
In order to accurately identify speech emotion information, the discriminant-cascading effect in dimensionality reduction of speech emotion recognition is investigated. Based on the existing locality preserving projec...In order to accurately identify speech emotion information, the discriminant-cascading effect in dimensionality reduction of speech emotion recognition is investigated. Based on the existing locality preserving projections and graph embedding framework, a novel discriminant-cascading dimensionality reduction method is proposed, which is named discriminant-cascading locality preserving projections (DCLPP). The proposed method specifically utilizes supervised embedding graphs and it keeps the original space for the inner products of samples to maintain enough information for speech emotion recognition. Then, the kernel DCLPP (KDCLPP) is also proposed to extend the mapping form. Validated by the experiments on the corpus of EMO-DB and eNTERFACE'05, the proposed method can clearly outperform the existing common dimensionality reduction methods, such as principal component analysis (PCA), linear discriminant analysis (LDA), locality preserving projections (LPP), local discriminant embedding (LDE), graph-based Fisher analysis (GbFA) and so on, with different categories of classifiers.展开更多
Some dimensionality reduction (DR) approaches based on support vector machine (SVM) are proposed. But the acquirement of the projection matrix in these approaches only considers the between-class margin based on S...Some dimensionality reduction (DR) approaches based on support vector machine (SVM) are proposed. But the acquirement of the projection matrix in these approaches only considers the between-class margin based on SVM while ignoring the within-class information in data. This paper presents a new DR approach, call- ed the dimensionality reduction based on SVM and LDA (DRSL). DRSL considers the between-class margins from SVM and LDA, and the within-class compactness from LDA to obtain the projection matrix. As a result, DRSL can realize the combination of the between-class and within-class information and fit the between-class and within-class structures in data. Hence, the obtained projection matrix increases the generalization ability of subsequent classification techniques. Experiments applied to classification techniques show the effectiveness of the proposed method.展开更多
We propose a novel framework for learning a low-dimensional representation of data based on nonlinear dynamical systems,which we call the dynamical dimension reduction(DDR).In the DDR model,each point is evolved via a...We propose a novel framework for learning a low-dimensional representation of data based on nonlinear dynamical systems,which we call the dynamical dimension reduction(DDR).In the DDR model,each point is evolved via a nonlinear flow towards a lower-dimensional subspace;the projection onto the subspace gives the low-dimensional embedding.Training the model involves identifying the nonlinear flow and the subspace.Following the equation discovery method,we represent the vector field that defines the flow using a linear combination of dictionary elements,where each element is a pre-specified linear/nonlinear candidate function.A regularization term for the average total kinetic energy is also introduced and motivated by the optimal transport theory.We prove that the resulting optimization problem is well-posed and establish several properties of the DDR method.We also show how the DDR method can be trained using a gradient-based optimization method,where the gradients are computed using the adjoint method from the optimal control theory.The DDR method is implemented and compared on synthetic and example data sets to other dimension reduction methods,including the PCA,t-SNE,and Umap.展开更多
We present a new algorithm for manifold learning and nonlinear dimensionality reduction. Based on a set of unorganized data points sampled with noise from a parameterized manifold, the local geometry of the manifold i...We present a new algorithm for manifold learning and nonlinear dimensionality reduction. Based on a set of unorganized data points sampled with noise from a parameterized manifold, the local geometry of the manifold is learned by constructing an approximation for the tangent space at each point, and those tangent spaces are then aligned to give the global coordinates of the data points with respect to the underlying manifold. We also present an error analysis of our algorithm showing that reconstruction errors can be quite small in some cases. We illustrate our algorithm using curves and surfaces both in 2D/3D Euclidean spaces and higher dimensional Euclidean spaces. We also address several theoretical and algorithmic issues for further research and improvements.展开更多
In the need of some real applications, such as text categorization and image classification, the multi-label learning gradually becomes a hot research point in recent years. Much attention has been paid to the researc...In the need of some real applications, such as text categorization and image classification, the multi-label learning gradually becomes a hot research point in recent years. Much attention has been paid to the research of multi-label classification algorithms. Considering the fact that the high dimensionality of the multi-label datasets may cause the curse of dimensionality and wil hamper the classification process, a dimensionality reduction algorithm, named multi-label kernel discriminant analysis (MLKDA), is proposed to reduce the dimensionality of multi-label datasets. MLKDA, with the kernel trick, processes the multi-label integrally and realizes the nonlinear dimensionality reduction with the idea similar with linear discriminant analysis (LDA). In the classification process of multi-label data, the extreme learning machine (ELM) is an efficient algorithm in the premise of good accuracy. MLKDA, combined with ELM, shows a good performance in multi-label learning experiments with several datasets. The experiments on both static data and data stream show that MLKDA outperforms multi-label dimensionality reduction via dependence maximization (MDDM) and multi-label linear discriminant analysis (MLDA) in cases of balanced datasets and stronger correlation between tags, and ELM is also a good choice for multi-label classification.展开更多
This paper presents a new dimension reduction strategy for medium and large-scale linear programming problems. The proposed method uses a subset of the original constraints and combines two algorithms: the weighted av...This paper presents a new dimension reduction strategy for medium and large-scale linear programming problems. The proposed method uses a subset of the original constraints and combines two algorithms: the weighted average and the cosine simplex algorithm. The first approach identifies binding constraints by using the weighted average of each constraint, whereas the second algorithm is based on the cosine similarity between the vector of the objective function and the constraints. These two approaches are complementary, and when used together, they locate the essential subset of initial constraints required for solving medium and large-scale linear programming problems. After reducing the dimension of the linear programming problem using the subset of the essential constraints, the solution method can be chosen from any suitable method for linear programming. The proposed approach was applied to a set of well-known benchmarks as well as more than 2000 random medium and large-scale linear programming problems. The results are promising, indicating that the new approach contributes to the reduction of both the size of the problems and the total number of iterations required. A tree-based classification model also confirmed the need for combining the two approaches. A detailed numerical example, the general numerical results, and the statistical analysis for the decision tree procedure are presented.展开更多
Dimensionality reduction and data visualization are useful and important processes in pattern recognition. Many techniques have been developed in the recent years. The self-organizing map (SOM) can be an efficient m...Dimensionality reduction and data visualization are useful and important processes in pattern recognition. Many techniques have been developed in the recent years. The self-organizing map (SOM) can be an efficient method for this purpose. This paper reviews recent advances in this area and related approaches such as multidimensional scaling (MDS), nonlinear PC A, principal manifolds, as well as the connections of the SOM and its recent variant, the visualization induced SOM (ViSOM), with these approaches. The SOM is shown to produce a quantized, qualitative scaling and while the ViSOM a quantitative or metric scaling and approximates principal curve/surface. The SOM can also be regarded as a generalized MDS to relate two metric spaces by forming a topological mapping between them. The relationships among various recently proposed techniques such as ViSOM, Isomap, LLE, and eigenmap are discussed and compared.展开更多
Arc sound is well known as the potential and available resource for monitoring and controlling of the weld penetration status,which is very important to the welding process quality control,so any attentions have been ...Arc sound is well known as the potential and available resource for monitoring and controlling of the weld penetration status,which is very important to the welding process quality control,so any attentions have been paid to the relationships between the arc sound and welding parameters.Some non-linear mapping models correlating the arc sound to welding parameters have been established with the help of neural networks.However,the research of utilizing arc sound to monitor and diagnose welding process is still in its infancy.A self-made real-time sensing system is applied to make a study of arc sound under typical penetration status,including partial penetration,unstable penetration,full penetration and excessive penetration,in metal inert-gas(MIG) flat tailored welding with spray transfer.Arc sound is pretreated by using wavelet de-noising and short-time windowing technologies,and its characteristics,characterizing weld penetration status,of time-domain,frequency-domain,cepstrum-domain and geometric-domain are extracted.Subsequently,high-dimensional eigenvector is constructed and feature-level parameters are successfully fused utilizing the concept of primary principal component analysis(PCA).Ultimately,60-demensional eigenvector is replaced by the synthesis of 8-demensional vector,which achieves compression for feature space and provides technical supports for pattern classification of typical penetration status with the help of arc sound in MIG welding in the future.展开更多
The frame of text classification system was presented. The high dimensionality in feature space for text classification was studied. The mutual information is a widely used information theoretic measure, in a descript...The frame of text classification system was presented. The high dimensionality in feature space for text classification was studied. The mutual information is a widely used information theoretic measure, in a descriptive way, to measure the stochastic dependency of discrete random variables. The measure method was used as a criterion to reduce high dimensionality of feature vectors in text classification on Web. Feature selections or conversions were performed by using maximum mutual information including linear and non-linear feature conversions. Entropy was used and extended to find right features commendably in pattern recognition systems. Favorable foundation would be established for text classification mining.展开更多
The high dimensions of hyperspectral imagery have caused burden for further processing. A new Fast Independent Component Analysis (FastICA) approach to dimensionality reduction for hyperspectral imagery is presented. ...The high dimensions of hyperspectral imagery have caused burden for further processing. A new Fast Independent Component Analysis (FastICA) approach to dimensionality reduction for hyperspectral imagery is presented. The virtual dimensionality is introduced to determine the number of dimensions needed to be preserved. Since there is no prioritization among independent components generated by the FastICA,the mixing matrix of FastICA is initialized by endmembers,which were extracted by using unsupervised maximum distance method. Minimum Noise Fraction (MNF) is used for preprocessing of original data,which can reduce the computational complexity of FastICA significantly. Finally,FastICA is performed on the selected principal components acquired by MNF to generate the expected independent components in accordance with the order of endmembers. Experimental results demonstrate that the proposed method outperforms second-order statistics-based transforms such as principle components analysis.展开更多
This paper presents two novel algorithms for feature extraction-Subpattern Complete Two Dimensional Linear Discriminant Principal Component Analysis (SpC2DLDPCA) and Subpattern Complete Two Dimensional Locality Preser...This paper presents two novel algorithms for feature extraction-Subpattern Complete Two Dimensional Linear Discriminant Principal Component Analysis (SpC2DLDPCA) and Subpattern Complete Two Dimensional Locality Preserving Principal Component Analysis (SpC2DLPPCA). The modified SpC2DLDPCA and SpC2DLPPCA algorithm over their non-subpattern version and Subpattern Complete Two Dimensional Principal Component Analysis (SpC2DPCA) methods benefit greatly in the following four points: (1) SpC2DLDPCA and SpC2DLPPCA can avoid the failure that the larger dimension matrix may bring about more consuming time on computing their eigenvalues and eigenvectors. (2) SpC2DLDPCA and SpC2DLPPCA can extract local information to implement recognition. (3)The idea of subblock is introduced into Two Dimensional Principal Component Analysis (2DPCA) and Two Dimensional Linear Discriminant Analysis (2DLDA). SpC2DLDPCA combines a discriminant analysis and a compression technique with low energy loss. (4) The idea is also introduced into 2DPCA and Two Dimensional Locality Preserving projections (2DLPP), so SpC2DLPPCA can preserve local neighbor graph structure and compact feature expressions. Finally, the experiments on the CASIA(B) gait database show that SpC2DLDPCA and SpC2DLPPCA have higher recognition accuracies than their non-subpattern versions and SpC2DPCA.展开更多
Dimension reduction is defined as the processes of projecting high-dimensional data to a much lower-dimensional space. Dimension reduction methods variously applied in regression, classification, feature analysis and ...Dimension reduction is defined as the processes of projecting high-dimensional data to a much lower-dimensional space. Dimension reduction methods variously applied in regression, classification, feature analysis and visualization. In this paper, we review in details the last and most new version of methods that extensively developed in the past decade.展开更多
Hyperspectral image(HSI)contains a wealth of spectral information,which makes fine classification of ground objects possible.In the meanwhile,overly redundant information in HSI brings many challenges.Specifically,the...Hyperspectral image(HSI)contains a wealth of spectral information,which makes fine classification of ground objects possible.In the meanwhile,overly redundant information in HSI brings many challenges.Specifically,the lack of training samples and the high computational cost are the inevitable obstacles in the design of classifier.In order to solve these problems,dimensionality reduction is usually adopted.Recently,graph-based dimensionality reduction has become a hot topic.In this paper,the graph-based methods for HSI dimensionality reduction are summarized from the following aspects.1)The traditional graph-based methods employ Euclidean distance to explore the local information of samples in spectral feature space.2)The dimensionality-reduction methods based on sparse or collaborative representation regard the sparse or collaborative coefficients as graph weights to effectively reduce reconstruction errors and represent most important information of HSI in the dictionary.3)Improved methods based on sparse or collaborative graph have made great progress by considering global low-rank information,local intra-class information and spatial information.In order to compare typical techniques,three real HSI datasets were used to carry out relevant experiments,and then the experimental results were analysed and discussed.Finally,the future development of this research field is prospected.展开更多
A micro-electromechanical system(MEMS)scanning mirror accelerates the raster scanning of optical-resolution photoacoustic microscopy(OR-PAM).However,the nonlinear tilt angular-voltage characteristic of a MEMS mirror i...A micro-electromechanical system(MEMS)scanning mirror accelerates the raster scanning of optical-resolution photoacoustic microscopy(OR-PAM).However,the nonlinear tilt angular-voltage characteristic of a MEMS mirror introduces distortion into the maximum back-projection image.Moreover,the size of the airy disk,ultrasonic sensor properties,and thermal effects decrease the resolution.Thus,in this study,we proposed a spatial weight matrix(SWM)with a dimensionality reduction for image reconstruction.The three-layer SWM contains the invariable information of the system,which includes a spatial dependent distortion correction and 3D deconvolution.We employed an ordinal-valued Markov random field and the Harris Stephen algorithm,as well as a modified delay-and-sum method during a time reversal.The results from the experiments and a quantitative analysis demonstrate that images can be effectively reconstructed using an SWM;this is also true for severely distorted images.The index of the mutual information between the reference images and registered images was 70.33 times higher than the initial index,on average.Moreover,the peak signal-to-noise ratio was increased by 17.08%after 3D deconvolution.This accomplishment offers a practical approach to image reconstruction and a promising method to achieve a real-time distortion correction for MEMS-based OR-PAM.展开更多
Dimensionality reduction techniques play an important role in data mining. Kernel entropy component analysis( KECA) is a newly developed method for data transformation and dimensionality reduction. This paper conducte...Dimensionality reduction techniques play an important role in data mining. Kernel entropy component analysis( KECA) is a newly developed method for data transformation and dimensionality reduction. This paper conducted a comparative study of KECA with other five dimensionality reduction methods,principal component analysis( PCA),kernel PCA( KPCA),locally linear embedding( LLE),laplacian eigenmaps( LAE) and diffusion maps( DM). Three quality assessment criteria, local continuity meta-criterion( LCMC),trustworthiness and continuity measure(T&C),and mean relative rank error( MRRE) are applied as direct performance indexes to assess those dimensionality reduction methods. Moreover,the clustering accuracy is used as an indirect performance index to evaluate the quality of the representative data gotten by those methods. The comparisons are performed on six datasets and the results are analyzed by Friedman test with the corresponding post-hoc tests. The results indicate that KECA shows an excellent performance in both quality assessment criteria and clustering accuracy assessing.展开更多
Purpose:Exploring a dimensionality reduction model that can adeptly eliminate outliers and select the appropriate number of clusters is of profound theoretical and practical importance.Additionally,the interpretabilit...Purpose:Exploring a dimensionality reduction model that can adeptly eliminate outliers and select the appropriate number of clusters is of profound theoretical and practical importance.Additionally,the interpretability of these models presents a persistent challenge.Design/methodology/approach:This paper proposes two innovative dimensionality reduction models based on integer programming(DRMBIP).These models assess compactness through the correlation of each indicator with its class center,while separation is evaluated by the correlation between different class centers.In contrast to DRMBIP-p,the DRMBIP-v considers the threshold parameter as a variable aiming to optimally balances both compactness and separation.Findings:This study,getting data from the Global Health Observatory(GHO),investigates 141 indicators that influence life expectancy.The findings reveal that DRMBIP-p effectively reduces the dimensionality of data,ensuring compactness.It also maintains compatibility with other models.Additionally,DRMBIP-v finds the optimal result,showing exceptional separation.Visualization of the results reveals that all classes have a high compactness.Research limitations:The DRMBIP-p requires the input of the correlation threshold parameter,which plays a pivotal role in the effectiveness of the final dimensionality reduction results.In the DRMBIP-v,modifying the threshold parameter to variable potentially emphasizes either separation or compactness.This necessitates an artificial adjustment to the overflow component within the objective function.Practical implications:The DRMBIP presented in this paper is adept at uncovering the primary geometric structures within high-dimensional indicators.Validated by life expectancy data,this paper demonstrates potential to assist data miners with the reduction of data dimensions.Originality/value:To our knowledge,this is the first time that integer programming has been used to build a dimensionality reduction model with indicator filtering.It not only has applications in life expectancy,but also has obvious advantages in data mining work that requires precise class centers.展开更多
As a key technique in hyperspectral image pre-processing,dimensionality reduction has received a lot of attention.However,most of the graph-based dimensionality reduction methods only consider a single structure in th...As a key technique in hyperspectral image pre-processing,dimensionality reduction has received a lot of attention.However,most of the graph-based dimensionality reduction methods only consider a single structure in the data and ignore the interfusion of multiple structures.In this paper,we propose two methods for combining intra-class competition for locally preserved graphs by constructing a new dictionary containing neighbourhood information.These two methods explore local information into the collaborative graph through competing constraints,thus effectively improving the overcrowded distribution of intra-class coefficients in the collaborative graph and enhancing the discriminative power of the algorithm.By classifying four benchmark hyperspectral data,the proposed methods are proved to be superior to several advanced algorithms,even under small-sample-size conditions.展开更多
Big data is a vast amount of structured and unstructured data that must be dealt with on a regular basis.Dimensionality reduction is the process of converting a huge set of data into data with tiny dimensions so that ...Big data is a vast amount of structured and unstructured data that must be dealt with on a regular basis.Dimensionality reduction is the process of converting a huge set of data into data with tiny dimensions so that equal information may be expressed easily.These tactics are frequently utilized to improve classification or regression challenges while dealing with machine learning issues.To achieve dimensionality reduction for huge data sets,this paper offers a hybrid particle swarm optimization-rough set PSO-RS and Mayfly algorithm-rough set MA-RS.A novel hybrid strategy based on the Mayfly algorithm(MA)and the rough set(RS)is proposed in particular.The performance of the novel hybrid algorithm MA-RS is evaluated by solving six different data sets from the literature.The simulation results and comparison with common reduction methods demonstrate the proposed MARS algorithm’s capacity to handle a wide range of data sets.Finally,the rough set approach,as well as the hybrid optimization techniques PSO-RS and MARS,were applied to deal with the massive data problem.MA-hybrid RS’s method beats other classic dimensionality reduction techniques,according to the experimental results and statistical testing studies.展开更多
文摘In this paper, we proposed a new semi-supervised multi-manifold learning method, called semi- supervised sparse multi-manifold embedding (S3MME), for dimensionality reduction of hyperspectral image data. S3MME exploits both the labeled and unlabeled data to adaptively find neighbors of each sample from the same manifold by using an optimization program based on sparse representation, and naturally gives relative importance to the labeled ones through a graph-based methodology. Then it tries to extract discriminative features on each manifold such that the data points in the same manifold become closer. The effectiveness of the proposed multi-manifold learning algorithm is demonstrated and compared through experiments on a real hyperspectral images.
基金Project(60425310) supported by the National Science Fund for Distinguished Young ScholarsProject(10JJ6094) supported by the Hunan Provincial Natural Foundation of China
文摘Multi-label data with high dimensionality often occurs,which will produce large time and energy overheads when directly used in classification tasks.To solve this problem,a novel algorithm called multi-label dimensionality reduction via semi-supervised discriminant analysis(MSDA) was proposed.It was expected to derive an objective discriminant function as smooth as possible on the data manifold by multi-label learning and semi-supervised learning.By virtue of the latent imformation,which was provided by the graph weighted matrix of sample attributes and the similarity correlation matrix of partial sample labels,MSDA readily made the separability between different classes achieve maximization and estimated the intrinsic geometric structure in the lower manifold space by employing unlabeled data.Extensive experimental results on several real multi-label datasets show that after dimensionality reduction using MSDA,the average classification accuracy is about 9.71% higher than that of other algorithms,and several evaluation metrices like Hamming-loss are also superior to those of other dimensionality reduction methods.
基金The National Natural Science Foundation of China(No.61231002,61273266)the Ph.D.Program Foundation of Ministry of Education of China(No.20110092130004)China Postdoctoral Science Foundation(No.2015M571637)
文摘In order to accurately identify speech emotion information, the discriminant-cascading effect in dimensionality reduction of speech emotion recognition is investigated. Based on the existing locality preserving projections and graph embedding framework, a novel discriminant-cascading dimensionality reduction method is proposed, which is named discriminant-cascading locality preserving projections (DCLPP). The proposed method specifically utilizes supervised embedding graphs and it keeps the original space for the inner products of samples to maintain enough information for speech emotion recognition. Then, the kernel DCLPP (KDCLPP) is also proposed to extend the mapping form. Validated by the experiments on the corpus of EMO-DB and eNTERFACE'05, the proposed method can clearly outperform the existing common dimensionality reduction methods, such as principal component analysis (PCA), linear discriminant analysis (LDA), locality preserving projections (LPP), local discriminant embedding (LDE), graph-based Fisher analysis (GbFA) and so on, with different categories of classifiers.
文摘Some dimensionality reduction (DR) approaches based on support vector machine (SVM) are proposed. But the acquirement of the projection matrix in these approaches only considers the between-class margin based on SVM while ignoring the within-class information in data. This paper presents a new DR approach, call- ed the dimensionality reduction based on SVM and LDA (DRSL). DRSL considers the between-class margins from SVM and LDA, and the within-class compactness from LDA to obtain the projection matrix. As a result, DRSL can realize the combination of the between-class and within-class information and fit the between-class and within-class structures in data. Hence, the obtained projection matrix increases the generalization ability of subsequent classification techniques. Experiments applied to classification techniques show the effectiveness of the proposed method.
文摘We propose a novel framework for learning a low-dimensional representation of data based on nonlinear dynamical systems,which we call the dynamical dimension reduction(DDR).In the DDR model,each point is evolved via a nonlinear flow towards a lower-dimensional subspace;the projection onto the subspace gives the low-dimensional embedding.Training the model involves identifying the nonlinear flow and the subspace.Following the equation discovery method,we represent the vector field that defines the flow using a linear combination of dictionary elements,where each element is a pre-specified linear/nonlinear candidate function.A regularization term for the average total kinetic energy is also introduced and motivated by the optimal transport theory.We prove that the resulting optimization problem is well-posed and establish several properties of the DDR method.We also show how the DDR method can be trained using a gradient-based optimization method,where the gradients are computed using the adjoint method from the optimal control theory.The DDR method is implemented and compared on synthetic and example data sets to other dimension reduction methods,including the PCA,t-SNE,and Umap.
文摘We present a new algorithm for manifold learning and nonlinear dimensionality reduction. Based on a set of unorganized data points sampled with noise from a parameterized manifold, the local geometry of the manifold is learned by constructing an approximation for the tangent space at each point, and those tangent spaces are then aligned to give the global coordinates of the data points with respect to the underlying manifold. We also present an error analysis of our algorithm showing that reconstruction errors can be quite small in some cases. We illustrate our algorithm using curves and surfaces both in 2D/3D Euclidean spaces and higher dimensional Euclidean spaces. We also address several theoretical and algorithmic issues for further research and improvements.
基金supported by the National Natural Science Foundation of China(5110505261173163)the Liaoning Provincial Natural Science Foundation of China(201102037)
文摘In the need of some real applications, such as text categorization and image classification, the multi-label learning gradually becomes a hot research point in recent years. Much attention has been paid to the research of multi-label classification algorithms. Considering the fact that the high dimensionality of the multi-label datasets may cause the curse of dimensionality and wil hamper the classification process, a dimensionality reduction algorithm, named multi-label kernel discriminant analysis (MLKDA), is proposed to reduce the dimensionality of multi-label datasets. MLKDA, with the kernel trick, processes the multi-label integrally and realizes the nonlinear dimensionality reduction with the idea similar with linear discriminant analysis (LDA). In the classification process of multi-label data, the extreme learning machine (ELM) is an efficient algorithm in the premise of good accuracy. MLKDA, combined with ELM, shows a good performance in multi-label learning experiments with several datasets. The experiments on both static data and data stream show that MLKDA outperforms multi-label dimensionality reduction via dependence maximization (MDDM) and multi-label linear discriminant analysis (MLDA) in cases of balanced datasets and stronger correlation between tags, and ELM is also a good choice for multi-label classification.
文摘This paper presents a new dimension reduction strategy for medium and large-scale linear programming problems. The proposed method uses a subset of the original constraints and combines two algorithms: the weighted average and the cosine simplex algorithm. The first approach identifies binding constraints by using the weighted average of each constraint, whereas the second algorithm is based on the cosine similarity between the vector of the objective function and the constraints. These two approaches are complementary, and when used together, they locate the essential subset of initial constraints required for solving medium and large-scale linear programming problems. After reducing the dimension of the linear programming problem using the subset of the essential constraints, the solution method can be chosen from any suitable method for linear programming. The proposed approach was applied to a set of well-known benchmarks as well as more than 2000 random medium and large-scale linear programming problems. The results are promising, indicating that the new approach contributes to the reduction of both the size of the problems and the total number of iterations required. A tree-based classification model also confirmed the need for combining the two approaches. A detailed numerical example, the general numerical results, and the statistical analysis for the decision tree procedure are presented.
文摘Dimensionality reduction and data visualization are useful and important processes in pattern recognition. Many techniques have been developed in the recent years. The self-organizing map (SOM) can be an efficient method for this purpose. This paper reviews recent advances in this area and related approaches such as multidimensional scaling (MDS), nonlinear PC A, principal manifolds, as well as the connections of the SOM and its recent variant, the visualization induced SOM (ViSOM), with these approaches. The SOM is shown to produce a quantized, qualitative scaling and while the ViSOM a quantitative or metric scaling and approximates principal curve/surface. The SOM can also be regarded as a generalized MDS to relate two metric spaces by forming a topological mapping between them. The relationships among various recently proposed techniques such as ViSOM, Isomap, LLE, and eigenmap are discussed and compared.
基金supported by Harbin Academic Pacesetter Foundation of China (Grant No. RC2012XK006002)Zhegjiang Provincial Natural Science Foundation of China (Grant No. Y1110262)+2 种基金Ningbo Municipal Natural Science Foundation of China (Grant No. 2011A610148)Ningbo Municipal Major Industrial Support Project of China (Grant No.2011B1007)Heilongjiang Provincial Natural Science Foundation of China (Grant No. E2007-01)
文摘Arc sound is well known as the potential and available resource for monitoring and controlling of the weld penetration status,which is very important to the welding process quality control,so any attentions have been paid to the relationships between the arc sound and welding parameters.Some non-linear mapping models correlating the arc sound to welding parameters have been established with the help of neural networks.However,the research of utilizing arc sound to monitor and diagnose welding process is still in its infancy.A self-made real-time sensing system is applied to make a study of arc sound under typical penetration status,including partial penetration,unstable penetration,full penetration and excessive penetration,in metal inert-gas(MIG) flat tailored welding with spray transfer.Arc sound is pretreated by using wavelet de-noising and short-time windowing technologies,and its characteristics,characterizing weld penetration status,of time-domain,frequency-domain,cepstrum-domain and geometric-domain are extracted.Subsequently,high-dimensional eigenvector is constructed and feature-level parameters are successfully fused utilizing the concept of primary principal component analysis(PCA).Ultimately,60-demensional eigenvector is replaced by the synthesis of 8-demensional vector,which achieves compression for feature space and provides technical supports for pattern classification of typical penetration status with the help of arc sound in MIG welding in the future.
文摘The frame of text classification system was presented. The high dimensionality in feature space for text classification was studied. The mutual information is a widely used information theoretic measure, in a descriptive way, to measure the stochastic dependency of discrete random variables. The measure method was used as a criterion to reduce high dimensionality of feature vectors in text classification on Web. Feature selections or conversions were performed by using maximum mutual information including linear and non-linear feature conversions. Entropy was used and extended to find right features commendably in pattern recognition systems. Favorable foundation would be established for text classification mining.
基金Supported by the National Natural Science Foundation of China (No. 60572135)
文摘The high dimensions of hyperspectral imagery have caused burden for further processing. A new Fast Independent Component Analysis (FastICA) approach to dimensionality reduction for hyperspectral imagery is presented. The virtual dimensionality is introduced to determine the number of dimensions needed to be preserved. Since there is no prioritization among independent components generated by the FastICA,the mixing matrix of FastICA is initialized by endmembers,which were extracted by using unsupervised maximum distance method. Minimum Noise Fraction (MNF) is used for preprocessing of original data,which can reduce the computational complexity of FastICA significantly. Finally,FastICA is performed on the selected principal components acquired by MNF to generate the expected independent components in accordance with the order of endmembers. Experimental results demonstrate that the proposed method outperforms second-order statistics-based transforms such as principle components analysis.
基金Sponsored by the National Science Foundation of China( Grant No. 61201370,61100103)the Independent Innovation Foundation of Shandong University( Grant No. 2012DX07)
文摘This paper presents two novel algorithms for feature extraction-Subpattern Complete Two Dimensional Linear Discriminant Principal Component Analysis (SpC2DLDPCA) and Subpattern Complete Two Dimensional Locality Preserving Principal Component Analysis (SpC2DLPPCA). The modified SpC2DLDPCA and SpC2DLPPCA algorithm over their non-subpattern version and Subpattern Complete Two Dimensional Principal Component Analysis (SpC2DPCA) methods benefit greatly in the following four points: (1) SpC2DLDPCA and SpC2DLPPCA can avoid the failure that the larger dimension matrix may bring about more consuming time on computing their eigenvalues and eigenvectors. (2) SpC2DLDPCA and SpC2DLPPCA can extract local information to implement recognition. (3)The idea of subblock is introduced into Two Dimensional Principal Component Analysis (2DPCA) and Two Dimensional Linear Discriminant Analysis (2DLDA). SpC2DLDPCA combines a discriminant analysis and a compression technique with low energy loss. (4) The idea is also introduced into 2DPCA and Two Dimensional Locality Preserving projections (2DLPP), so SpC2DLPPCA can preserve local neighbor graph structure and compact feature expressions. Finally, the experiments on the CASIA(B) gait database show that SpC2DLDPCA and SpC2DLPPCA have higher recognition accuracies than their non-subpattern versions and SpC2DPCA.
文摘Dimension reduction is defined as the processes of projecting high-dimensional data to a much lower-dimensional space. Dimension reduction methods variously applied in regression, classification, feature analysis and visualization. In this paper, we review in details the last and most new version of methods that extensively developed in the past decade.
基金supported by the National Key Research and Development Project(No.2020YFC1512000)the National Natural Science Foundation of China(No.41601344)+2 种基金the Fundamental Research Funds for the Central Universities(Nos.300102320107 and 201924)in part by the General Projects of Key R&D Programs in Shaanxi Province(No.2020GY-060)Xi’an Science&Technology Project(Nos.2020KJRC0126 and 202018)。
文摘Hyperspectral image(HSI)contains a wealth of spectral information,which makes fine classification of ground objects possible.In the meanwhile,overly redundant information in HSI brings many challenges.Specifically,the lack of training samples and the high computational cost are the inevitable obstacles in the design of classifier.In order to solve these problems,dimensionality reduction is usually adopted.Recently,graph-based dimensionality reduction has become a hot topic.In this paper,the graph-based methods for HSI dimensionality reduction are summarized from the following aspects.1)The traditional graph-based methods employ Euclidean distance to explore the local information of samples in spectral feature space.2)The dimensionality-reduction methods based on sparse or collaborative representation regard the sparse or collaborative coefficients as graph weights to effectively reduce reconstruction errors and represent most important information of HSI in the dictionary.3)Improved methods based on sparse or collaborative graph have made great progress by considering global low-rank information,local intra-class information and spatial information.In order to compare typical techniques,three real HSI datasets were used to carry out relevant experiments,and then the experimental results were analysed and discussed.Finally,the future development of this research field is prospected.
基金supported by National Natural Science Foundation of China,Nos.61822505,11774101,61627827Science and Technology Planning Project of Guangdong Province,No.2015B020233016+2 种基金China Postdoctoral Science Foundation,No.2019 M652943Natural Science Foundation of Guangdong Province,No.2019A1515011399Guangzhou Science and Technology Program key projects,Nos.2019050001.
文摘A micro-electromechanical system(MEMS)scanning mirror accelerates the raster scanning of optical-resolution photoacoustic microscopy(OR-PAM).However,the nonlinear tilt angular-voltage characteristic of a MEMS mirror introduces distortion into the maximum back-projection image.Moreover,the size of the airy disk,ultrasonic sensor properties,and thermal effects decrease the resolution.Thus,in this study,we proposed a spatial weight matrix(SWM)with a dimensionality reduction for image reconstruction.The three-layer SWM contains the invariable information of the system,which includes a spatial dependent distortion correction and 3D deconvolution.We employed an ordinal-valued Markov random field and the Harris Stephen algorithm,as well as a modified delay-and-sum method during a time reversal.The results from the experiments and a quantitative analysis demonstrate that images can be effectively reconstructed using an SWM;this is also true for severely distorted images.The index of the mutual information between the reference images and registered images was 70.33 times higher than the initial index,on average.Moreover,the peak signal-to-noise ratio was increased by 17.08%after 3D deconvolution.This accomplishment offers a practical approach to image reconstruction and a promising method to achieve a real-time distortion correction for MEMS-based OR-PAM.
基金Climbing Peak Discipline Project of Shanghai Dianji University,China(No.15DFXK02)Hi-Tech Research and Development Programs of China(No.2007AA041600)
文摘Dimensionality reduction techniques play an important role in data mining. Kernel entropy component analysis( KECA) is a newly developed method for data transformation and dimensionality reduction. This paper conducted a comparative study of KECA with other five dimensionality reduction methods,principal component analysis( PCA),kernel PCA( KPCA),locally linear embedding( LLE),laplacian eigenmaps( LAE) and diffusion maps( DM). Three quality assessment criteria, local continuity meta-criterion( LCMC),trustworthiness and continuity measure(T&C),and mean relative rank error( MRRE) are applied as direct performance indexes to assess those dimensionality reduction methods. Moreover,the clustering accuracy is used as an indirect performance index to evaluate the quality of the representative data gotten by those methods. The comparisons are performed on six datasets and the results are analyzed by Friedman test with the corresponding post-hoc tests. The results indicate that KECA shows an excellent performance in both quality assessment criteria and clustering accuracy assessing.
基金supported by the National Natural Science Foundation of China (Nos.72371115)the Natural Science Foundation of Jilin,China (No.20230101184JC)。
文摘Purpose:Exploring a dimensionality reduction model that can adeptly eliminate outliers and select the appropriate number of clusters is of profound theoretical and practical importance.Additionally,the interpretability of these models presents a persistent challenge.Design/methodology/approach:This paper proposes two innovative dimensionality reduction models based on integer programming(DRMBIP).These models assess compactness through the correlation of each indicator with its class center,while separation is evaluated by the correlation between different class centers.In contrast to DRMBIP-p,the DRMBIP-v considers the threshold parameter as a variable aiming to optimally balances both compactness and separation.Findings:This study,getting data from the Global Health Observatory(GHO),investigates 141 indicators that influence life expectancy.The findings reveal that DRMBIP-p effectively reduces the dimensionality of data,ensuring compactness.It also maintains compatibility with other models.Additionally,DRMBIP-v finds the optimal result,showing exceptional separation.Visualization of the results reveals that all classes have a high compactness.Research limitations:The DRMBIP-p requires the input of the correlation threshold parameter,which plays a pivotal role in the effectiveness of the final dimensionality reduction results.In the DRMBIP-v,modifying the threshold parameter to variable potentially emphasizes either separation or compactness.This necessitates an artificial adjustment to the overflow component within the objective function.Practical implications:The DRMBIP presented in this paper is adept at uncovering the primary geometric structures within high-dimensional indicators.Validated by life expectancy data,this paper demonstrates potential to assist data miners with the reduction of data dimensions.Originality/value:To our knowledge,this is the first time that integer programming has been used to build a dimensionality reduction model with indicator filtering.It not only has applications in life expectancy,but also has obvious advantages in data mining work that requires precise class centers.
基金supported by the National Natural Science Foundation of China(No.41601344)the Fundamental Research Funds for the Central Universities(Nos.300102320107 and 201924)+2 种基金the National Key Research and Development Project(No.2020YFC1512000)in part by the General Projects of Key R&D Programs in Shaanxi Province(No.2020GY-060)Xi’an Science&Technology Project(Nos.2020KJRC0126 and 202018)。
文摘As a key technique in hyperspectral image pre-processing,dimensionality reduction has received a lot of attention.However,most of the graph-based dimensionality reduction methods only consider a single structure in the data and ignore the interfusion of multiple structures.In this paper,we propose two methods for combining intra-class competition for locally preserved graphs by constructing a new dictionary containing neighbourhood information.These two methods explore local information into the collaborative graph through competing constraints,thus effectively improving the overcrowded distribution of intra-class coefficients in the collaborative graph and enhancing the discriminative power of the algorithm.By classifying four benchmark hyperspectral data,the proposed methods are proved to be superior to several advanced algorithms,even under small-sample-size conditions.
文摘Big data is a vast amount of structured and unstructured data that must be dealt with on a regular basis.Dimensionality reduction is the process of converting a huge set of data into data with tiny dimensions so that equal information may be expressed easily.These tactics are frequently utilized to improve classification or regression challenges while dealing with machine learning issues.To achieve dimensionality reduction for huge data sets,this paper offers a hybrid particle swarm optimization-rough set PSO-RS and Mayfly algorithm-rough set MA-RS.A novel hybrid strategy based on the Mayfly algorithm(MA)and the rough set(RS)is proposed in particular.The performance of the novel hybrid algorithm MA-RS is evaluated by solving six different data sets from the literature.The simulation results and comparison with common reduction methods demonstrate the proposed MARS algorithm’s capacity to handle a wide range of data sets.Finally,the rough set approach,as well as the hybrid optimization techniques PSO-RS and MARS,were applied to deal with the massive data problem.MA-hybrid RS’s method beats other classic dimensionality reduction techniques,according to the experimental results and statistical testing studies.