Image matching technology is theoretically significant and practically promising in the field of autonomous navigation.Addressing shortcomings of existing image matching navigation technologies,the concept of high-dim...Image matching technology is theoretically significant and practically promising in the field of autonomous navigation.Addressing shortcomings of existing image matching navigation technologies,the concept of high-dimensional combined feature is presented based on sequence image matching navigation.To balance between the distribution of high-dimensional combined features and the shortcomings of the only use of geometric relations,we propose a method based on Delaunay triangulation to improve the feature,and add the regional characteristics of the features together with their geometric characteristics.Finally,k-nearest neighbor(KNN)algorithm is adopted to optimize searching process.Simulation results show that the matching can be realized at the rotation angle of-8°to 8°and the scale factor of 0.9 to 1.1,and when the image size is 160 pixel×160 pixel,the matching time is less than 0.5 s.Therefore,the proposed algorithm can substantially reduce computational complexity,improve the matching speed,and exhibit robustness to the rotation and scale changes.展开更多
The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o...The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.展开更多
With the abundance of exceptionally High Dimensional data, feature selection has become an essential element in the Data Mining process. In this paper, we investigate the problem of efficient feature selection for cla...With the abundance of exceptionally High Dimensional data, feature selection has become an essential element in the Data Mining process. In this paper, we investigate the problem of efficient feature selection for classification on High Dimensional datasets. We present a novel filter based approach for feature selection that sorts out the features based on a score and then we measure the performance of four different Data Mining classification algorithms on the resulting data. In the proposed approach, we partition the sorted feature and search the important feature in forward manner as well as in reversed manner, while starting from first and last feature simultaneously in the sorted list. The proposed approach is highly scalable and effective as it parallelizes over both attribute and tuples simultaneously allowing us to evaluate many of potential features for High Dimensional datasets. The newly proposed framework for feature selection is experimentally shown to be very valuable with real and synthetic High Dimensional datasets which improve the precision of selected features. We have also tested it to measure classification accuracy against various feature selection process.展开更多
To estimate basal water storage beneath the Antarctic ice sheet, it is essential to have data on the three-dimensional characteristics of subglacial lakes. We present a method to estimate the water depth and surface a...To estimate basal water storage beneath the Antarctic ice sheet, it is essential to have data on the three-dimensional characteristics of subglacial lakes. We present a method to estimate the water depth and surface area of Antarctic subglacial lakes from the inversion of hydraulic potential method. Lake Vostok is chosen as a case study because of the diverse and comprehensive measurements that have been obtained over and around the lake. The average depth of Lake Vostok is around 345±4 m. We estimated the surface area of Lake Vostok beneath the ice sheet to be about 13300±594 km^2. The lake consists of two sub-basins separated by a ridge at water depths of about 200–300 m. The surface area of the northern sub-basin is estimated to be about half of that of the southern basin. The maximum depths of the northern and southern sub-basins are estimated to be about 450 and 850 m, respectively. Total water volume is estimated to be about 4658±204 km^3. These estimates are compared with previous estimates obtained from seismic data and inversion of aerogravity data. In general, our estimates are closer to those obtained from the inversion of aerogravity data than those from seismic data, indicating the applicability of our method to the estimation of water depths of other subglacial lakes.展开更多
In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified fro...In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified from the point of view of introducing Jensen-Shannon divergence to measure the importance of covariates. The idea of the method is to calculate the Jensen-Shannon divergence between the conditional probability distribution of the covariates on a given response variable and the unconditional probability distribution of the covariates, and then use the probabilities of the response variables as weights to calculate the weighted Jensen-Shannon divergence, where a larger weighted Jensen-Shannon divergence means that the covariates are more important. Additionally, we also investigated an adapted version of the method, which is to measure the relationship between the covariates and the response variable using the weighted Jensen-Shannon divergence adjusted by the logarithmic factor of the number of categories when the number of categories in each covariate varies. Then, through both theoretical and simulation experiments, it was demonstrated that the proposed methods have sure screening and ranking consistency properties. Finally, the results from simulation and real-dataset experiments show that in feature screening, the proposed methods investigated are robust in performance and faster in computational speed compared with an existing method.展开更多
It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable ...It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable feature screening method is very limited;to handle this non-trivial situation, we propose a model-free feature screening for ultrahigh-dimensional multi-classification with both categorical and continuous covariates. The proposed feature screening method will be based on Gini impurity to evaluate the prediction power of covariates. Under certain regularity conditions, it is proved that the proposed screening procedure possesses the sure screening property and ranking consistency properties. We demonstrate the finite sample performance of the proposed procedure by simulation studies and illustrate using real data analysis.展开更多
It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limit...It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates.展开更多
Objective To analyze characteristics of high altitude pulmonary edema(HAPE)in Chinese patients.Methods We performed a retrospective study of 98 patients with HAPE.We reviewed the medical records and summarized the cli...Objective To analyze characteristics of high altitude pulmonary edema(HAPE)in Chinese patients.Methods We performed a retrospective study of 98 patients with HAPE.We reviewed the medical records and summarized the clinical,laboratory and imaging characteristics of these cases,and compared the results on admission with those determined before discharge.Results Forty-eight(49.0%)patients developed HAPE at the altitude of 2800 m to 3000 m.Ninty-five(96.9%)patients were man.Moist rales were audible from the both lungs,and moist rales over the right lung were clearer than those over the left lung in fourteen patients.The white blood cells[(12.83±5.55)versus(8.95±3.23)×109/L,P=0.001)]as well as neutrophil counts[(11.34±3.81)versus(7.49±2.83)×109/L,P=0.001)]were higher,whereas the counts of other subsets of white blood cells were lower on admission than those after recovery(all P<0.05).Serum levels of alkaline phosphatase(115.8±37.6 versus 85.7±32.4 mmol/L,P=0.020),cholinesterase(7226.2±1631.8 versus 6285.3±1693.3 mmol/L,P=0.040),creatinine(85.2±17.1 versus 75.1±12.8 mmol/L,P=0.021),uric acid(401.9±114.2 versus 326.0±154.3 mmol/L,P=0.041),and uric glucose(7.20±1.10 versus 5.51±1.11 mmol/L,P=0.001)were higher,but carbondioxide combining power(CO2CP,26.7±4.4 versus 28.9±4.5 mmol/L,P=0.042)and serous calcium(2.32±0.13 versus 2.41±0.10 mmol/L,P=0.006)were lower on admission.Arterial blood gas results showed hypoxemia and respiratory alkalosis on admission.Conclusions In the present research,men were more susceptible to HAPE than women,and in the process of HAPE,the lesions of the right lung were more serious than those of the left lung.Some indicators of routine blood test and blood biochemistry of HAPE patients changed.展开更多
An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sp...An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sparse Feature Vector', thus reduces the data scaleenormously, and can get the clustering result with only one data scan. Both theoretical analysis andempirical tests showed that CABOSFV is of low computational complexity. The algorithm findsclusters in high dimensional large datasets efficiently and handles noise effectively.展开更多
Arc sound is well known as the potential and available resource for monitoring and controlling of the weld penetration status,which is very important to the welding process quality control,so any attentions have been ...Arc sound is well known as the potential and available resource for monitoring and controlling of the weld penetration status,which is very important to the welding process quality control,so any attentions have been paid to the relationships between the arc sound and welding parameters.Some non-linear mapping models correlating the arc sound to welding parameters have been established with the help of neural networks.However,the research of utilizing arc sound to monitor and diagnose welding process is still in its infancy.A self-made real-time sensing system is applied to make a study of arc sound under typical penetration status,including partial penetration,unstable penetration,full penetration and excessive penetration,in metal inert-gas(MIG) flat tailored welding with spray transfer.Arc sound is pretreated by using wavelet de-noising and short-time windowing technologies,and its characteristics,characterizing weld penetration status,of time-domain,frequency-domain,cepstrum-domain and geometric-domain are extracted.Subsequently,high-dimensional eigenvector is constructed and feature-level parameters are successfully fused utilizing the concept of primary principal component analysis(PCA).Ultimately,60-demensional eigenvector is replaced by the synthesis of 8-demensional vector,which achieves compression for feature space and provides technical supports for pattern classification of typical penetration status with the help of arc sound in MIG welding in the future.展开更多
Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data ...Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667.展开更多
This study employs a data-driven methodology that embeds the principle of dimensional invariance into an artificial neural network to automatically identify dominant dimensionless quantities in the penetration of rod ...This study employs a data-driven methodology that embeds the principle of dimensional invariance into an artificial neural network to automatically identify dominant dimensionless quantities in the penetration of rod projectiles into semi-infinite metal targets from experimental measurements.The derived mathematical expressions of dimensionless quantities are simplified by the examination of the exponent matrix and coupling relationships between feature variables.As a physics-based dimension reduction methodology,this way reduces high-dimensional parameter spaces to descriptions involving only a few physically interpretable dimensionless quantities in penetrating cases.Then the relative importance of various dimensionless feature variables on the penetration efficiencies for four impacting conditions is evaluated through feature selection engineering.The results indicate that the selected critical dimensionless feature variables by this synergistic method,without referring to the complex theoretical equations and aiding in the detailed knowledge of penetration mechanics,are in accordance with those reported in the reference.Lastly,the determined dimensionless quantities can be efficiently applied to conduct semi-empirical analysis for the specific penetrating case,and the reliability of regression functions is validated.展开更多
Three high dimensional spatial standardization algorithms are used for diffusion tensor image(DTI)registration,and seven kinds of methods are used to evaluate their performances.Firstly,the template used in this paper...Three high dimensional spatial standardization algorithms are used for diffusion tensor image(DTI)registration,and seven kinds of methods are used to evaluate their performances.Firstly,the template used in this paper was obtained by spatial transformation of 16 subjects by means of tensor-based standardization.Then,high dimensional standardization algorithms for diffusion tensor images,including fractional anisotropy(FA)based diffeomorphic registration algorithm,FA based elastic registration algorithm and tensor-based registration algorithm,were performed.Finally,7 kinds of evaluation methods,including normalized standard deviation,dyadic coherence,diffusion cross-correlation,overlap of eigenvalue-eigenvector pairs,Euclidean distance of diffusion tensor,and Euclidean distance of the deviatoric tensor and deviatoric of tensors,were used to qualitatively compare and summarize the above standardization algorithms.Experimental results revealed that the high-dimensional tensor-based standardization algorithms perform well and can maintain the consistency of anatomical structures.展开更多
This article proposes a VGG network with histogram of oriented gradient(HOG) feature fusion(HOG-VGG) for polarization synthetic aperture radar(PolSAR) image terrain classification.VGG-Net has a strong ability of deep ...This article proposes a VGG network with histogram of oriented gradient(HOG) feature fusion(HOG-VGG) for polarization synthetic aperture radar(PolSAR) image terrain classification.VGG-Net has a strong ability of deep feature extraction,which can fully extract the global deep features of different terrains in PolSAR images,so it is widely used in PolSAR terrain classification.However,VGG-Net ignores the local edge & shape features,resulting in incomplete feature representation of the PolSAR terrains,as a consequence,the terrain classification accuracy is not promising.In fact,edge and shape features play an important role in PolSAR terrain classification.To solve this problem,a new VGG network with HOG feature fusion was specifically proposed for high-precision PolSAR terrain classification.HOG-VGG extracts both the global deep semantic features and the local edge & shape features of the PolSAR terrains,so the terrain feature representation completeness is greatly elevated.Moreover,HOG-VGG optimally fuses the global deep features and the local edge & shape features to achieve the best classification results.The superiority of HOG-VGG is verified on the Flevoland,San Francisco and Oberpfaffenhofen datasets.Experiments show that the proposed HOG-VGG achieves much better PolSAR terrain classification performance,with overall accuracies of 97.54%,94.63%,and 96.07%,respectively.展开更多
In this research, a content-based image retrieval (CBIR) system for high resolution satellite images has been developed by using texture features. The proposed approach uses the local binary pattern (LBP) texture ...In this research, a content-based image retrieval (CBIR) system for high resolution satellite images has been developed by using texture features. The proposed approach uses the local binary pattern (LBP) texture feature and a block based scheme. The query and database images are divided into equally sized blocks, from which LBP histograms are extracted. The block histograms are then compared by using the Chi-square distance. Experimental results show that the LBP representation provides a powerful tool for high resolution satellite images (HRSI) retrieval.展开更多
In this study, we determine differences in total biomass of soil microorganisms and community structure (using the most probable number of bacteria (MPN) and the number of fungal genera) in patterned ground features (...In this study, we determine differences in total biomass of soil microorganisms and community structure (using the most probable number of bacteria (MPN) and the number of fungal genera) in patterned ground features (PGF) and adjacent vegetated soils (AVS) in mesic sites from three High Arctic islands in order to characterize microbial dynamics as affected by cryoturbation, and a broad bioclimatic gradient. We also characterize total biomass of soil microorganisms and the most probable number of bacteria along a topographic gradient within each bioclimatic subzone to evaluate whether differences in topography lead to differences in microbial dynamics at a smaller scale. We found total microbial biomass C, the most probable number of heterotrophic bacteria, and fungal genera vary along this bioclimatic gradient. Microbial biomass C decreased with increasing latitude. Overall, microbial biomass C, MPN and the number of fungal isolates were higher in AVS than in PGFs. The effects which topographic position had on microbial biomass C varied across the bioclimatic gradient as there was no effect of topographic position in Isachsen (subzone A) and Mould Bay (subzone B), when compared to Green Cabin (subzone C, warmer site).There was no effect of topographic position on MPN counts at Mould Bay and Green Cabin. However, in Isachsen, MPN counts were highest in the wet topographic position as compared to the mesic and dry. In conclusion, PGFs seem to decouple the effect climate that might have on the total biomass of soil microorganisms along the bioclimatic gradient;and influence gets ameliorated as latitude increases. Similarly, the effect of topography on the total microbial biomass is significant at the warmest bioclimatic zone of the gradient. Thus, climate and topographic effects on total microbial biomass increase with warmer climate.展开更多
The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities...The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction.展开更多
Based on the particle-in-cell technology and the secondary electron emission theory, a three-dimensional simulation method for multipactor is presented in this paper. By combining the finite difference time domain met...Based on the particle-in-cell technology and the secondary electron emission theory, a three-dimensional simulation method for multipactor is presented in this paper. By combining the finite difference time domain method and the panicle tracing method, such an algorithm is self-consistent and accurate since the interaction between electromagnetic fields and particles is properly modeled. In the time domain aspect, the generation of multipactor can be easily visualized, which makes it possible to gain a deeper insight into the physical mechanism of this effect. In addition to the classic secondary electron emission model, the measured practical secondary electron yield is used, which increases the accuracy of the algorithm. In order to validate the method, the impedance transformer and ridge waveguide filter are studied. By analyzing the evolution of the secondaries obtained by our method, multipactor thresholds of these components are estimated, which show good agreement with the experimental results. Furthermore, the most sensitive positions where multipactor occurs are determined from the phase focusing phenomenon, which is very meaningful for multipactor analysis and design.展开更多
Guaranteed cost consensus analysis and design problems for high-dimensional multi-agent systems with time varying delays are investigated. The idea of guaranteed cost con trol is introduced into consensus problems for...Guaranteed cost consensus analysis and design problems for high-dimensional multi-agent systems with time varying delays are investigated. The idea of guaranteed cost con trol is introduced into consensus problems for high-dimensiona multi-agent systems with time-varying delays, where a cos function is defined based on state errors among neighboring agents and control inputs of all the agents. By the state space decomposition approach and the linear matrix inequality(LMI)sufficient conditions for guaranteed cost consensus and consensu alization are given. Moreover, a guaranteed cost upper bound o the cost function is determined. It should be mentioned that these LMI criteria are dependent on the change rate of time delays and the maximum time delay, the guaranteed cost upper bound is only dependent on the maximum time delay but independen of the Laplacian matrix. Finally, numerical simulations are given to demonstrate theoretical results.展开更多
Although the recent advances in stem cell engineering have gained a great deal of attention due to their high potential in clinical research,the applicability of stem cells for preclinical screening in the drug discov...Although the recent advances in stem cell engineering have gained a great deal of attention due to their high potential in clinical research,the applicability of stem cells for preclinical screening in the drug discovery process is still challenging due to difficulties in controlling the stem cell microenvironment and the limited availability of high-throughput systems.Recently,researchers have been actively developing and evaluating three-dimensional(3D)cell culture-based platforms using microfluidic technologies,such as organ-on-a-chip and organoid-on-a-chip platforms,and they have achieved promising breakthroughs in stem cell engineering.In this review,we start with a comprehensive discussion on the importance of microfluidic 3D cell culture techniques in stem cell research and their technical strategies in the field of drug discovery.In a subsequent section,we discuss microfluidic 3D cell culture techniques for high-throughput analysis for use in stem cell research.In addition,some potential and practical applications of organ-on-a-chip or organoid-on-a-chip platforms using stem cells as drug screening and disease models are highlighted.展开更多
基金supported by the National Natural Science Foundations of China(Nos.51205193,51475221)
文摘Image matching technology is theoretically significant and practically promising in the field of autonomous navigation.Addressing shortcomings of existing image matching navigation technologies,the concept of high-dimensional combined feature is presented based on sequence image matching navigation.To balance between the distribution of high-dimensional combined features and the shortcomings of the only use of geometric relations,we propose a method based on Delaunay triangulation to improve the feature,and add the regional characteristics of the features together with their geometric characteristics.Finally,k-nearest neighbor(KNN)algorithm is adopted to optimize searching process.Simulation results show that the matching can be realized at the rotation angle of-8°to 8°and the scale factor of 0.9 to 1.1,and when the image size is 160 pixel×160 pixel,the matching time is less than 0.5 s.Therefore,the proposed algorithm can substantially reduce computational complexity,improve the matching speed,and exhibit robustness to the rotation and scale changes.
文摘The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.
文摘With the abundance of exceptionally High Dimensional data, feature selection has become an essential element in the Data Mining process. In this paper, we investigate the problem of efficient feature selection for classification on High Dimensional datasets. We present a novel filter based approach for feature selection that sorts out the features based on a score and then we measure the performance of four different Data Mining classification algorithms on the resulting data. In the proposed approach, we partition the sorted feature and search the important feature in forward manner as well as in reversed manner, while starting from first and last feature simultaneously in the sorted list. The proposed approach is highly scalable and effective as it parallelizes over both attribute and tuples simultaneously allowing us to evaluate many of potential features for High Dimensional datasets. The newly proposed framework for feature selection is experimentally shown to be very valuable with real and synthetic High Dimensional datasets which improve the precision of selected features. We have also tested it to measure classification accuracy against various feature selection process.
基金funded by the Natural Science Foundation of China (Grant nos. 41674085 and 41621091)the National Key Basic Research Program of China (973 program, Grant nos. 2012CB957703 and 2013CB733301)
文摘To estimate basal water storage beneath the Antarctic ice sheet, it is essential to have data on the three-dimensional characteristics of subglacial lakes. We present a method to estimate the water depth and surface area of Antarctic subglacial lakes from the inversion of hydraulic potential method. Lake Vostok is chosen as a case study because of the diverse and comprehensive measurements that have been obtained over and around the lake. The average depth of Lake Vostok is around 345±4 m. We estimated the surface area of Lake Vostok beneath the ice sheet to be about 13300±594 km^2. The lake consists of two sub-basins separated by a ridge at water depths of about 200–300 m. The surface area of the northern sub-basin is estimated to be about half of that of the southern basin. The maximum depths of the northern and southern sub-basins are estimated to be about 450 and 850 m, respectively. Total water volume is estimated to be about 4658±204 km^3. These estimates are compared with previous estimates obtained from seismic data and inversion of aerogravity data. In general, our estimates are closer to those obtained from the inversion of aerogravity data than those from seismic data, indicating the applicability of our method to the estimation of water depths of other subglacial lakes.
文摘In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified from the point of view of introducing Jensen-Shannon divergence to measure the importance of covariates. The idea of the method is to calculate the Jensen-Shannon divergence between the conditional probability distribution of the covariates on a given response variable and the unconditional probability distribution of the covariates, and then use the probabilities of the response variables as weights to calculate the weighted Jensen-Shannon divergence, where a larger weighted Jensen-Shannon divergence means that the covariates are more important. Additionally, we also investigated an adapted version of the method, which is to measure the relationship between the covariates and the response variable using the weighted Jensen-Shannon divergence adjusted by the logarithmic factor of the number of categories when the number of categories in each covariate varies. Then, through both theoretical and simulation experiments, it was demonstrated that the proposed methods have sure screening and ranking consistency properties. Finally, the results from simulation and real-dataset experiments show that in feature screening, the proposed methods investigated are robust in performance and faster in computational speed compared with an existing method.
文摘It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable feature screening method is very limited;to handle this non-trivial situation, we propose a model-free feature screening for ultrahigh-dimensional multi-classification with both categorical and continuous covariates. The proposed feature screening method will be based on Gini impurity to evaluate the prediction power of covariates. Under certain regularity conditions, it is proved that the proposed screening procedure possesses the sure screening property and ranking consistency properties. We demonstrate the finite sample performance of the proposed procedure by simulation studies and illustrate using real data analysis.
文摘It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates.
基金Supported by the National Science and Technology Major Projects for Major New Drugs Innovation and Development [2014ZX09J14102-02A(2014.1-2016.12)]
文摘Objective To analyze characteristics of high altitude pulmonary edema(HAPE)in Chinese patients.Methods We performed a retrospective study of 98 patients with HAPE.We reviewed the medical records and summarized the clinical,laboratory and imaging characteristics of these cases,and compared the results on admission with those determined before discharge.Results Forty-eight(49.0%)patients developed HAPE at the altitude of 2800 m to 3000 m.Ninty-five(96.9%)patients were man.Moist rales were audible from the both lungs,and moist rales over the right lung were clearer than those over the left lung in fourteen patients.The white blood cells[(12.83±5.55)versus(8.95±3.23)×109/L,P=0.001)]as well as neutrophil counts[(11.34±3.81)versus(7.49±2.83)×109/L,P=0.001)]were higher,whereas the counts of other subsets of white blood cells were lower on admission than those after recovery(all P<0.05).Serum levels of alkaline phosphatase(115.8±37.6 versus 85.7±32.4 mmol/L,P=0.020),cholinesterase(7226.2±1631.8 versus 6285.3±1693.3 mmol/L,P=0.040),creatinine(85.2±17.1 versus 75.1±12.8 mmol/L,P=0.021),uric acid(401.9±114.2 versus 326.0±154.3 mmol/L,P=0.041),and uric glucose(7.20±1.10 versus 5.51±1.11 mmol/L,P=0.001)were higher,but carbondioxide combining power(CO2CP,26.7±4.4 versus 28.9±4.5 mmol/L,P=0.042)and serous calcium(2.32±0.13 versus 2.41±0.10 mmol/L,P=0.006)were lower on admission.Arterial blood gas results showed hypoxemia and respiratory alkalosis on admission.Conclusions In the present research,men were more susceptible to HAPE than women,and in the process of HAPE,the lesions of the right lung were more serious than those of the left lung.Some indicators of routine blood test and blood biochemistry of HAPE patients changed.
文摘An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sparse Feature Vector', thus reduces the data scaleenormously, and can get the clustering result with only one data scan. Both theoretical analysis andempirical tests showed that CABOSFV is of low computational complexity. The algorithm findsclusters in high dimensional large datasets efficiently and handles noise effectively.
基金supported by Harbin Academic Pacesetter Foundation of China (Grant No. RC2012XK006002)Zhegjiang Provincial Natural Science Foundation of China (Grant No. Y1110262)+2 种基金Ningbo Municipal Natural Science Foundation of China (Grant No. 2011A610148)Ningbo Municipal Major Industrial Support Project of China (Grant No.2011B1007)Heilongjiang Provincial Natural Science Foundation of China (Grant No. E2007-01)
文摘Arc sound is well known as the potential and available resource for monitoring and controlling of the weld penetration status,which is very important to the welding process quality control,so any attentions have been paid to the relationships between the arc sound and welding parameters.Some non-linear mapping models correlating the arc sound to welding parameters have been established with the help of neural networks.However,the research of utilizing arc sound to monitor and diagnose welding process is still in its infancy.A self-made real-time sensing system is applied to make a study of arc sound under typical penetration status,including partial penetration,unstable penetration,full penetration and excessive penetration,in metal inert-gas(MIG) flat tailored welding with spray transfer.Arc sound is pretreated by using wavelet de-noising and short-time windowing technologies,and its characteristics,characterizing weld penetration status,of time-domain,frequency-domain,cepstrum-domain and geometric-domain are extracted.Subsequently,high-dimensional eigenvector is constructed and feature-level parameters are successfully fused utilizing the concept of primary principal component analysis(PCA).Ultimately,60-demensional eigenvector is replaced by the synthesis of 8-demensional vector,which achieves compression for feature space and provides technical supports for pattern classification of typical penetration status with the help of arc sound in MIG welding in the future.
基金Project(RDF 11-02-03)supported by the Research Development Fund of XJTLU,China
文摘Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667.
基金supported by the National Natural Science Foundation of China(Grant Nos.12272257,12102292,12032006)the special fund for Science and Technology Innovation Teams of Shanxi Province(Nos.202204051002006).
文摘This study employs a data-driven methodology that embeds the principle of dimensional invariance into an artificial neural network to automatically identify dominant dimensionless quantities in the penetration of rod projectiles into semi-infinite metal targets from experimental measurements.The derived mathematical expressions of dimensionless quantities are simplified by the examination of the exponent matrix and coupling relationships between feature variables.As a physics-based dimension reduction methodology,this way reduces high-dimensional parameter spaces to descriptions involving only a few physically interpretable dimensionless quantities in penetrating cases.Then the relative importance of various dimensionless feature variables on the penetration efficiencies for four impacting conditions is evaluated through feature selection engineering.The results indicate that the selected critical dimensionless feature variables by this synergistic method,without referring to the complex theoretical equations and aiding in the detailed knowledge of penetration mechanics,are in accordance with those reported in the reference.Lastly,the determined dimensionless quantities can be efficiently applied to conduct semi-empirical analysis for the specific penetrating case,and the reliability of regression functions is validated.
基金Supported by the National Key Research and Development Program of China(2016YFC0100300)the National Natural Science Foundation of China(61402371,61771369)+1 种基金the Natural Science Basic Research Plan in Shaanxi Province of China(2017JM6008)the Fundamental Research Funds for the Central Universities of China(3102017zy032,3102018zy020)
文摘Three high dimensional spatial standardization algorithms are used for diffusion tensor image(DTI)registration,and seven kinds of methods are used to evaluate their performances.Firstly,the template used in this paper was obtained by spatial transformation of 16 subjects by means of tensor-based standardization.Then,high dimensional standardization algorithms for diffusion tensor images,including fractional anisotropy(FA)based diffeomorphic registration algorithm,FA based elastic registration algorithm and tensor-based registration algorithm,were performed.Finally,7 kinds of evaluation methods,including normalized standard deviation,dyadic coherence,diffusion cross-correlation,overlap of eigenvalue-eigenvector pairs,Euclidean distance of diffusion tensor,and Euclidean distance of the deviatoric tensor and deviatoric of tensors,were used to qualitatively compare and summarize the above standardization algorithms.Experimental results revealed that the high-dimensional tensor-based standardization algorithms perform well and can maintain the consistency of anatomical structures.
基金Sponsored by the Fundamental Research Funds for the Central Universities of China(Grant No.PA2023IISL0098)the Hefei Municipal Natural Science Foundation(Grant No.202201)+1 种基金the National Natural Science Foundation of China(Grant No.62071164)the Open Fund of Information Materials and Intelligent Sensing Laboratory of Anhui Province(Anhui University)(Grant No.IMIS202214 and IMIS202102)。
文摘This article proposes a VGG network with histogram of oriented gradient(HOG) feature fusion(HOG-VGG) for polarization synthetic aperture radar(PolSAR) image terrain classification.VGG-Net has a strong ability of deep feature extraction,which can fully extract the global deep features of different terrains in PolSAR images,so it is widely used in PolSAR terrain classification.However,VGG-Net ignores the local edge & shape features,resulting in incomplete feature representation of the PolSAR terrains,as a consequence,the terrain classification accuracy is not promising.In fact,edge and shape features play an important role in PolSAR terrain classification.To solve this problem,a new VGG network with HOG feature fusion was specifically proposed for high-precision PolSAR terrain classification.HOG-VGG extracts both the global deep semantic features and the local edge & shape features of the PolSAR terrains,so the terrain feature representation completeness is greatly elevated.Moreover,HOG-VGG optimally fuses the global deep features and the local edge & shape features to achieve the best classification results.The superiority of HOG-VGG is verified on the Flevoland,San Francisco and Oberpfaffenhofen datasets.Experiments show that the proposed HOG-VGG achieves much better PolSAR terrain classification performance,with overall accuracies of 97.54%,94.63%,and 96.07%,respectively.
文摘In this research, a content-based image retrieval (CBIR) system for high resolution satellite images has been developed by using texture features. The proposed approach uses the local binary pattern (LBP) texture feature and a block based scheme. The query and database images are divided into equally sized blocks, from which LBP histograms are extracted. The block histograms are then compared by using the Chi-square distance. Experimental results show that the LBP representation provides a powerful tool for high resolution satellite images (HRSI) retrieval.
文摘In this study, we determine differences in total biomass of soil microorganisms and community structure (using the most probable number of bacteria (MPN) and the number of fungal genera) in patterned ground features (PGF) and adjacent vegetated soils (AVS) in mesic sites from three High Arctic islands in order to characterize microbial dynamics as affected by cryoturbation, and a broad bioclimatic gradient. We also characterize total biomass of soil microorganisms and the most probable number of bacteria along a topographic gradient within each bioclimatic subzone to evaluate whether differences in topography lead to differences in microbial dynamics at a smaller scale. We found total microbial biomass C, the most probable number of heterotrophic bacteria, and fungal genera vary along this bioclimatic gradient. Microbial biomass C decreased with increasing latitude. Overall, microbial biomass C, MPN and the number of fungal isolates were higher in AVS than in PGFs. The effects which topographic position had on microbial biomass C varied across the bioclimatic gradient as there was no effect of topographic position in Isachsen (subzone A) and Mould Bay (subzone B), when compared to Green Cabin (subzone C, warmer site).There was no effect of topographic position on MPN counts at Mould Bay and Green Cabin. However, in Isachsen, MPN counts were highest in the wet topographic position as compared to the mesic and dry. In conclusion, PGFs seem to decouple the effect climate that might have on the total biomass of soil microorganisms along the bioclimatic gradient;and influence gets ameliorated as latitude increases. Similarly, the effect of topography on the total microbial biomass is significant at the warmest bioclimatic zone of the gradient. Thus, climate and topographic effects on total microbial biomass increase with warmer climate.
基金Supported by the National Natural Science Foundation of China(No.61502475)the Importation and Development of High-Caliber Talents Project of the Beijing Municipal Institutions(No.CIT&TCD201504039)
文摘The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction.
基金Project supported by the National Key Laboratory Foundation,China(Grant No.9140C530103110C5301)
文摘Based on the particle-in-cell technology and the secondary electron emission theory, a three-dimensional simulation method for multipactor is presented in this paper. By combining the finite difference time domain method and the panicle tracing method, such an algorithm is self-consistent and accurate since the interaction between electromagnetic fields and particles is properly modeled. In the time domain aspect, the generation of multipactor can be easily visualized, which makes it possible to gain a deeper insight into the physical mechanism of this effect. In addition to the classic secondary electron emission model, the measured practical secondary electron yield is used, which increases the accuracy of the algorithm. In order to validate the method, the impedance transformer and ridge waveguide filter are studied. By analyzing the evolution of the secondaries obtained by our method, multipactor thresholds of these components are estimated, which show good agreement with the experimental results. Furthermore, the most sensitive positions where multipactor occurs are determined from the phase focusing phenomenon, which is very meaningful for multipactor analysis and design.
基金supported by Shaanxi Province Natural Science Foundation of Research Projects(2016JM6014)the Innovation Foundation of High-Tech Institute of Xi’an(2015ZZDJJ03)the Youth Foundation of HighTech Institute of Xi’an(2016QNJJ004)
文摘Guaranteed cost consensus analysis and design problems for high-dimensional multi-agent systems with time varying delays are investigated. The idea of guaranteed cost con trol is introduced into consensus problems for high-dimensiona multi-agent systems with time-varying delays, where a cos function is defined based on state errors among neighboring agents and control inputs of all the agents. By the state space decomposition approach and the linear matrix inequality(LMI)sufficient conditions for guaranteed cost consensus and consensu alization are given. Moreover, a guaranteed cost upper bound o the cost function is determined. It should be mentioned that these LMI criteria are dependent on the change rate of time delays and the maximum time delay, the guaranteed cost upper bound is only dependent on the maximum time delay but independen of the Laplacian matrix. Finally, numerical simulations are given to demonstrate theoretical results.
基金supported by the National Research Foundation of Korea (NRF) (NRF2017R1C1B2002377, NRF-2016R1A5A1010148, and NRF2019R1A2C1003111)funded by the Ministry of Science and ICT (MSIT)partly supported by the Technology Innovation Program (No.10067787)funded by the Ministry of Trade, Industry & Energy (MOTE, Korea)
文摘Although the recent advances in stem cell engineering have gained a great deal of attention due to their high potential in clinical research,the applicability of stem cells for preclinical screening in the drug discovery process is still challenging due to difficulties in controlling the stem cell microenvironment and the limited availability of high-throughput systems.Recently,researchers have been actively developing and evaluating three-dimensional(3D)cell culture-based platforms using microfluidic technologies,such as organ-on-a-chip and organoid-on-a-chip platforms,and they have achieved promising breakthroughs in stem cell engineering.In this review,we start with a comprehensive discussion on the importance of microfluidic 3D cell culture techniques in stem cell research and their technical strategies in the field of drug discovery.In a subsequent section,we discuss microfluidic 3D cell culture techniques for high-throughput analysis for use in stem cell research.In addition,some potential and practical applications of organ-on-a-chip or organoid-on-a-chip platforms using stem cells as drug screening and disease models are highlighted.