In this paper,we introduce the censored composite conditional quantile coefficient(cC-CQC)to rank the relative importance of each predictor in high-dimensional censored regression.The cCCQC takes advantage of all usef...In this paper,we introduce the censored composite conditional quantile coefficient(cC-CQC)to rank the relative importance of each predictor in high-dimensional censored regression.The cCCQC takes advantage of all useful information across quantiles and can detect nonlinear effects including interactions and heterogeneity,effectively.Furthermore,the proposed screening method based on cCCQC is robust to the existence of outliers and enjoys the sure screening property.Simulation results demonstrate that the proposed method performs competitively on survival datasets of high-dimensional predictors,particularly when the variables are highly correlated.展开更多
The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o...The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.展开更多
The conventional hospital environment is transformed into digital transformation that focuses on patient centric remote approach through advanced technologies.Early diagnosis of many diseases will improve the patient ...The conventional hospital environment is transformed into digital transformation that focuses on patient centric remote approach through advanced technologies.Early diagnosis of many diseases will improve the patient life.The cost of health care systems is reduced due to the use of advanced technologies such as Internet of Things(IoT),Wireless Sensor Networks(WSN),Embedded systems,Deep learning approaches and Optimization and aggregation methods.The data generated through these technologies will demand the bandwidth,data rate,latency of the network.In this proposed work,efficient discrete grey wolf optimization(DGWO)based data aggregation scheme using Elliptic curve Elgamal with Message Authentication code(ECEMAC)has been used to aggregate the parameters generated from the wearable sensor devices of the patient.The nodes that are far away from edge node will forward the data to its neighbor cluster head using DGWO.Aggregation scheme will reduce the number of transmissions over the network.The aggregated data are preprocessed at edge node to remove the noise for better diagnosis.Edge node will reduce the overhead of cloud server.The aggregated data are forward to cloud server for central storage and diagnosis.This proposed smart diagnosis will reduce the transmission cost through aggrega-tion scheme which will reduce the energy of the system.Energy cost for proposed system for 300 nodes is 0.34μJ.Various energy cost of existing approaches such as secure privacy preserving data aggregation scheme(SPPDA),concealed data aggregation scheme for multiple application(CDAMA)and secure aggregation scheme(ASAS)are 1.3μJ,0.81μJ and 0.51μJ respectively.The optimization approaches and encryption method will ensure the data privacy.展开更多
In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)...In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)feature extraction technique.First,dimensionality of the original imbalanced data is reduced using MDS so that distances between any two different samples are preserved as well as possible.Second,a novel OPCE algorithm is applied to classify imbalanced samples by placing optimised observation points in a low-dimensional data space.Third,optimization of the observation point mappings is carried out to obtain a reliable assessment of the unknown samples.Exhaustive experiments have been conducted to evaluate the feasibility,rationality,and effectiveness of the proposed OPCE algorithm using seven benchmark HDIC data sets.Experimental results show that(1)the OPCE algorithm can be trained faster on low-dimensional imbalanced data than on high-dimensional data;(2)the OPCE algorithm can correctly identify samples as the number of optimised observation points is increased;and(3)statistical analysis reveals that OPCE yields better HDIC performances on the selected data sets in comparison with eight other HDIC algorithms.This demonstrates that OPCE is a viable algorithm to deal with HDIC problems.展开更多
As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected featu...As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected features.Evolutionary computing(EC)is promising for FS owing to its powerful search capability.However,in traditional EC-based methods,feature subsets are represented via a length-fixed individual encoding.It is ineffective for high-dimensional data,because it results in a huge search space and prohibitive training time.This work proposes a length-adaptive non-dominated sorting genetic algorithm(LA-NSGA)with a length-variable individual encoding and a length-adaptive evolution mechanism for bi-objective highdimensional FS.In LA-NSGA,an initialization method based on correlation and redundancy is devised to initialize individuals of diverse lengths,and a Pareto dominance-based length change operator is introduced to guide individuals to explore in promising search space adaptively.Moreover,a dominance-based local search method is employed for further improvement.The experimental results based on 12 high-dimensional gene datasets show that the Pareto front of feature subsets produced by LA-NSGA is superior to those of existing algorithms.展开更多
The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities...The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction.展开更多
Similarity measure design for discrete data group was proposed. Similarity measure design for continuous membership function was also carried out. Proposed similarity measures were designed based on fuzzy number and d...Similarity measure design for discrete data group was proposed. Similarity measure design for continuous membership function was also carried out. Proposed similarity measures were designed based on fuzzy number and distance measure, and were proved. To calculate the degree of similarity of discrete data, relative degree between data and total distribution was obtained. Discrete data similarity measure was completed with combination of mentioned relative degrees. Power interconnected system with multi characteristics was considered to apply discrete similarity measure. Naturally, similarity measure was extended to multi-dimensional similarity measure case, and applied to bus clustering problem.展开更多
This research involved an exploratory evaluation of the dynamics of vehicular traffic on a road network across two traffic light-controlled junctions. The study uses the case study of a one-kilometer road system model...This research involved an exploratory evaluation of the dynamics of vehicular traffic on a road network across two traffic light-controlled junctions. The study uses the case study of a one-kilometer road system modelled on Anylogic version 8.8.4. Anylogic is a multi-paradigm simulation tool that supports three main simulation methodologies: discrete event simulation, agent-based modeling, and system dynamics modeling. The system is used to evaluate the implication of stochastic time-based vehicle variables on the general efficiency of road use. Road use efficiency as reflected in this model is based on the percentage of entry vehicles to exit the model within a one-hour simulation period. The study deduced that for the model under review, an increase in entry point time delay has a domineering influence on the efficiency of road use far beyond any other consideration. This study therefore presents a novel approach that leverages Discrete Events Simulation to facilitate efficient road management with a focus on optimum road use efficiency. The study also determined that the inclusion of appropriate random parameters to reflect road use activities at critical event points in a simulation can help in the effective representation of authentic traffic models. The Anylogic simulation software leverages the Classic DEVS and Parallel DEVS formalisms to achieve these objectives.展开更多
Objective: To develop a new bioinformatic tool based on a data-mining approach for extraction of the most infor- mative proteins that could be used to find the potential biomarkers for the detection of cancer. Methods...Objective: To develop a new bioinformatic tool based on a data-mining approach for extraction of the most infor- mative proteins that could be used to find the potential biomarkers for the detection of cancer. Methods: Two independent datasets from serum samples of 253 ovarian cancer and 167 breast cancer patients were used. The samples were examined by surface- enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS). The datasets were used to extract the informative proteins using a data-mining method in the discrete stationary wavelet transform domain. As a dimensionality re- duction procedure, the hard thresholding method was applied to reduce the number of wavelet coefficients. Also, a distance measure was used to select the most discriminative coefficients. To find the potential biomarkers using the selected wavelet coefficients, we applied the inverse discrete stationary wavelet transform combined with a two-sided t-test. Results: From the ovarian cancer dataset, a set of five proteins were detected as potential biomarkers that could be used to identify the cancer patients from the healthy cases with accuracy, sensitivity, and specificity of 100%. Also, from the breast cancer dataset, a set of eight proteins were found as the potential biomarkers that could separate the healthy cases from the cancer patients with accuracy of 98.26%, sensitivity of 100%, and specificity of 95.6%. Conclusion: The results have shown that the new bioinformatic tool can be used in combination with the high-throughput proteomic data such as SELDI-TOF MS to find the potential biomarkers with high discriminative power.展开更多
Problems existin similarity measurement and index tree construction which affect the performance of nearest neighbor search of high-dimensional data. The equidistance problem is solved using NPsim function to calculat...Problems existin similarity measurement and index tree construction which affect the performance of nearest neighbor search of high-dimensional data. The equidistance problem is solved using NPsim function to calculate similarity. And a sequential NPsim matrix is built to improve indexing performance. To sum up the above innovations,a nearest neighbor search algorithm of high-dimensional data based on sequential NPsim matrix is proposed in comparison with the nearest neighbor search algorithms based on KD-tree or SR-tree on Munsell spectral data set. Experimental results show that the proposed algorithm similarity is better than that of other algorithms and searching speed is more than thousands times of others. In addition,the slow construction speed of sequential NPsim matrix can be increased by using parallel computing.展开更多
Viticulturists traditionally have a keen interest in studying the relationship between the biochemistry of grapevines’ leaves/petioles and their associated spectral reflectance in order to understand the fruit ripeni...Viticulturists traditionally have a keen interest in studying the relationship between the biochemistry of grapevines’ leaves/petioles and their associated spectral reflectance in order to understand the fruit ripening rate, water status, nutrient levels, and disease risk. In this paper, we implement imaging spectroscopy (hyperspectral) reflectance data, for the reflective 330 - 2510 nm wavelength region (986 total spectral bands), to assess vineyard nutrient status;this constitutes a high dimensional dataset with a covariance matrix that is ill-conditioned. The identification of the variables (wavelength bands) that contribute useful information for nutrient assessment and prediction, plays a pivotal role in multivariate statistical modeling. In recent years, researchers have successfully developed many continuous, nearly unbiased, sparse and accurate variable selection methods to overcome this problem. This paper compares four regularized and one functional regression methods: Elastic Net, Multi-Step Adaptive Elastic Net, Minimax Concave Penalty, iterative Sure Independence Screening, and Functional Data Analysis for wavelength variable selection. Thereafter, the predictive performance of these regularized sparse models is enhanced using the stepwise regression. This comparative study of regression methods using a high-dimensional and highly correlated grapevine hyperspectral dataset revealed that the performance of Elastic Net for variable selection yields the best predictive ability.展开更多
Making accurate forecast or prediction is a challenging task in the big data era, in particular for those datasets involving high-dimensional variables but short-term time series points,which are generally available f...Making accurate forecast or prediction is a challenging task in the big data era, in particular for those datasets involving high-dimensional variables but short-term time series points,which are generally available from real-world systems.To address this issue, Prof.展开更多
Analytical solutions have varied uses. One is to provide solutions that can be used in verification of numerical methods. Another is to provide relatively simple forms of exact solutions that can be used in estimating...Analytical solutions have varied uses. One is to provide solutions that can be used in verification of numerical methods. Another is to provide relatively simple forms of exact solutions that can be used in estimating parameters, thus, it is possible to reduce computation time in comparison with numerical methods. In this paper, an alternative procedure is presented. Here is used a hybrid solution based on Green's function and real characteristics (discrete data) of the boundary conditions.展开更多
Multi-area combined economic/emission dispatch(MACEED)problems are generally studied using analytical functions.However,as the scale of power systems increases,ex isting solutions become time-consuming and may not mee...Multi-area combined economic/emission dispatch(MACEED)problems are generally studied using analytical functions.However,as the scale of power systems increases,ex isting solutions become time-consuming and may not meet oper ational constraints.To overcome excessive computational ex pense in high-dimensional MACEED problems,a novel data-driven surrogate-assisted method is proposed.First,a cosine-similarity-based deep belief network combined with a back-propagation(DBN+BP)neural network is utilized to replace cost and emission functions.Second,transfer learning is applied with a pretraining and fine-tuning method to improve DBN+BP regression surrogate models,thus realizing fast con struction of surrogate models between different regional power systems.Third,a multi-objective antlion optimizer with a novel general single-dimension retention bi-objective optimization poli cy is proposed to execute MACEED optimization to obtain scheduling decisions.The proposed method not only ensures the convergence,uniformity,and extensibility of the Pareto front,but also greatly reduces the computational time.Finally,a 4-ar ea 40-unit test system with different constraints is employed to demonstrate the effectiveness of the proposed method.展开更多
Wi-Fi devices have limited battery life because of which conserving battery life is imperative. The 802.11 Wi-Fi standard provides power management feature that allows stations(STAs) to enter into sleep state to prese...Wi-Fi devices have limited battery life because of which conserving battery life is imperative. The 802.11 Wi-Fi standard provides power management feature that allows stations(STAs) to enter into sleep state to preserve energy without any frame losses. After the STA wakes up, it sends a null data or PS-Poll frame to retrieve frame(s) buffered by the access point(AP), if any during its sleep period. An attacker can launch a power save denial of service(PS-DoS) attack on the sleeping STA(s) by transmitting a spoofed null data or PS-Poll frame(s) to retrieve the buffered frame(s) of the sleeping STA(s) from the AP causing frame losses for the targeted STA(s). Current approaches to prevent or detect the PS-DoS attack require encryption,change in protocol or installation of proprietary hardware. These solutions suffer from expensive setup, maintenance, scalability and deployment issues. The PS-DoS attack does not differ in semantics or statistics under normal and attack circumstances.So signature and anomaly based intrusion detection system(IDS) are unfit to detect the PS-DoS attack. In this paper we propose a timed IDS based on real time discrete event system(RTDES) for detecting PS-DoS attack. The proposed DES based IDS overcomes the drawbacks of existing systems and detects the PS-DoS attack with high accuracy and detection rate. The correctness of the RTDES based IDS is proved by experimenting all possible attack scenarios.展开更多
Latent factor(LF) models are highly effective in extracting useful knowledge from High-Dimensional and Sparse(HiDS) matrices which are commonly seen in various industrial applications. An LF model usually adopts itera...Latent factor(LF) models are highly effective in extracting useful knowledge from High-Dimensional and Sparse(HiDS) matrices which are commonly seen in various industrial applications. An LF model usually adopts iterative optimizers,which may consume many iterations to achieve a local optima,resulting in considerable time cost. Hence, determining how to accelerate the training process for LF models has become a significant issue. To address this, this work proposes a randomized latent factor(RLF) model. It incorporates the principle of randomized learning techniques from neural networks into the LF analysis of HiDS matrices, thereby greatly alleviating computational burden. It also extends a standard learning process for randomized neural networks in context of LF analysis to make the resulting model represent an HiDS matrix correctly.Experimental results on three HiDS matrices from industrial applications demonstrate that compared with state-of-the-art LF models, RLF is able to achieve significantly higher computational efficiency and comparable prediction accuracy for missing data.I provides an important alternative approach to LF analysis of HiDS matrices, which is especially desired for industrial applications demanding highly efficient models.展开更多
The governing factors that influence landslide occurrences are complicated by the different soil conditions at various sites.To resolve the problem,this study focused on spatial information technology to collect data ...The governing factors that influence landslide occurrences are complicated by the different soil conditions at various sites.To resolve the problem,this study focused on spatial information technology to collect data and information on geology.GIS,remote sensing and digital elevation model(DEM) were used in combination to extract the attribute values of the surface material in the vast study area of SheiPa National Park,Taiwan.The factors influencing landslides were collected and quantification values computed.The major soil component of loam and gravel in the Shei-Pa area resulted in different landslide problems.The major factors were successfully extracted from the influencing factors.Finally,the discrete rough set(DRS) classifier was used as a tool to find the threshold of each attribute contributing to landslide occurrence,based upon the knowledge database.This rule-based knowledge database provides an effective and urgent system to manage landslides.NDVI(Normalized Difference Vegetation Index),VI(Vegetation Index),elevation,and distance from the road are the four major influencing factors for landslide occurrence.The landslide hazard potential diagrams(landslide susceptibility maps) were drawn and a rational accuracy rate of landslide was calculated.This study thus offers a systematic solution to the investigation of landslide disasters.展开更多
Aimed at the issue that traditional clustering methods are not appropriate to high-dimensional data, a cuckoo search fuzzy-weighting algorithm for subspace clustering is presented on the basis of the exited soft subsp...Aimed at the issue that traditional clustering methods are not appropriate to high-dimensional data, a cuckoo search fuzzy-weighting algorithm for subspace clustering is presented on the basis of the exited soft subspace clustering algorithm. In the proposed algorithm, a novel objective function is firstly designed by considering the fuzzy weighting within-cluster compactness and the between-cluster separation, and loosening the constraints of dimension weight matrix. Then gradual membership and improved Cuckoo search, a global search strategy, are introduced to optimize the objective function and search subspace clusters, giving novel learning rules for clustering. At last, the performance of the proposed algorithm on the clustering analysis of various low and high dimensional datasets is experimentally compared with that of several competitive subspace clustering algorithms. Experimental studies demonstrate that the proposed algorithm can obtain better performance than most of the existing soft subspace clustering algorithms.展开更多
The accuracy of target threat estimation has a great impact on command decision-making.The Bayesian network,as an effective way to deal with the problem of uncertainty,can be used to track the change of the target thr...The accuracy of target threat estimation has a great impact on command decision-making.The Bayesian network,as an effective way to deal with the problem of uncertainty,can be used to track the change of the target threat level.Unfortunately,the traditional discrete dynamic Bayesian network(DDBN)has the problems of poor parameter learning and poor reasoning accuracy in a small sample environment with partial prior information missing.Considering the finiteness and discreteness of DDBN parameters,a fuzzy k-nearest neighbor(KNN)algorithm based on correlation of feature quantities(CF-FKNN)is proposed for DDBN parameter learning.Firstly,the correlation between feature quantities is calculated,and then the KNN algorithm with fuzzy weight is introduced to fill the missing data.On this basis,a reasonable DDBN structure is constructed by using expert experience to complete DDBN parameter learning and reasoning.Simulation results show that the CF-FKNN algorithm can accurately fill in the data when the samples are seriously missing,and improve the effect of DDBN parameter learning in the case of serious sample missing.With the proposed method,the final target threat assessment results are reasonable,which meets the needs of engineering applications.展开更多
For packet-based transmission of data over a network, or temporary sensor failure, etc., data samples may be missing in the measured signals. This paper deals with the problem of H∞ filter design for linear discrete-...For packet-based transmission of data over a network, or temporary sensor failure, etc., data samples may be missing in the measured signals. This paper deals with the problem of H∞ filter design for linear discrete-time systems with missing measurements. The missing measurements will happen at any sample time, and the probability of the occurrence of missing data is assumed to be known. The main purpose is to obtain both full-and reduced-order filters such that the filter error systems are exponentially mean-square stable and guarantee a prescribed H∞ performance in terms of linear matrix inequality (LMI). A numerical example is provided to demonstrate the validity of the proposed design approach.展开更多
基金Outstanding Youth Foundation of Hunan Provincial Department of Education(Grant No.22B0911)。
文摘In this paper,we introduce the censored composite conditional quantile coefficient(cC-CQC)to rank the relative importance of each predictor in high-dimensional censored regression.The cCCQC takes advantage of all useful information across quantiles and can detect nonlinear effects including interactions and heterogeneity,effectively.Furthermore,the proposed screening method based on cCCQC is robust to the existence of outliers and enjoys the sure screening property.Simulation results demonstrate that the proposed method performs competitively on survival datasets of high-dimensional predictors,particularly when the variables are highly correlated.
文摘The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.
基金This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute(KHIDI)funded by the Ministry of Health&Welfare,Republic of Korea(grant number:HI21C1831)the Soonchunhyang University Research Fund.
文摘The conventional hospital environment is transformed into digital transformation that focuses on patient centric remote approach through advanced technologies.Early diagnosis of many diseases will improve the patient life.The cost of health care systems is reduced due to the use of advanced technologies such as Internet of Things(IoT),Wireless Sensor Networks(WSN),Embedded systems,Deep learning approaches and Optimization and aggregation methods.The data generated through these technologies will demand the bandwidth,data rate,latency of the network.In this proposed work,efficient discrete grey wolf optimization(DGWO)based data aggregation scheme using Elliptic curve Elgamal with Message Authentication code(ECEMAC)has been used to aggregate the parameters generated from the wearable sensor devices of the patient.The nodes that are far away from edge node will forward the data to its neighbor cluster head using DGWO.Aggregation scheme will reduce the number of transmissions over the network.The aggregated data are preprocessed at edge node to remove the noise for better diagnosis.Edge node will reduce the overhead of cloud server.The aggregated data are forward to cloud server for central storage and diagnosis.This proposed smart diagnosis will reduce the transmission cost through aggrega-tion scheme which will reduce the energy of the system.Energy cost for proposed system for 300 nodes is 0.34μJ.Various energy cost of existing approaches such as secure privacy preserving data aggregation scheme(SPPDA),concealed data aggregation scheme for multiple application(CDAMA)and secure aggregation scheme(ASAS)are 1.3μJ,0.81μJ and 0.51μJ respectively.The optimization approaches and encryption method will ensure the data privacy.
基金National Natural Science Foundation of China,Grant/Award Number:61972261Basic Research Foundations of Shenzhen,Grant/Award Numbers:JCYJ20210324093609026,JCYJ20200813091134001。
文摘In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)feature extraction technique.First,dimensionality of the original imbalanced data is reduced using MDS so that distances between any two different samples are preserved as well as possible.Second,a novel OPCE algorithm is applied to classify imbalanced samples by placing optimised observation points in a low-dimensional data space.Third,optimization of the observation point mappings is carried out to obtain a reliable assessment of the unknown samples.Exhaustive experiments have been conducted to evaluate the feasibility,rationality,and effectiveness of the proposed OPCE algorithm using seven benchmark HDIC data sets.Experimental results show that(1)the OPCE algorithm can be trained faster on low-dimensional imbalanced data than on high-dimensional data;(2)the OPCE algorithm can correctly identify samples as the number of optimised observation points is increased;and(3)statistical analysis reveals that OPCE yields better HDIC performances on the selected data sets in comparison with eight other HDIC algorithms.This demonstrates that OPCE is a viable algorithm to deal with HDIC problems.
基金supported in part by the National Natural Science Foundation of China(62172065,62072060)。
文摘As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected features.Evolutionary computing(EC)is promising for FS owing to its powerful search capability.However,in traditional EC-based methods,feature subsets are represented via a length-fixed individual encoding.It is ineffective for high-dimensional data,because it results in a huge search space and prohibitive training time.This work proposes a length-adaptive non-dominated sorting genetic algorithm(LA-NSGA)with a length-variable individual encoding and a length-adaptive evolution mechanism for bi-objective highdimensional FS.In LA-NSGA,an initialization method based on correlation and redundancy is devised to initialize individuals of diverse lengths,and a Pareto dominance-based length change operator is introduced to guide individuals to explore in promising search space adaptively.Moreover,a dominance-based local search method is employed for further improvement.The experimental results based on 12 high-dimensional gene datasets show that the Pareto front of feature subsets produced by LA-NSGA is superior to those of existing algorithms.
基金Supported by the National Natural Science Foundation of China(No.61502475)the Importation and Development of High-Caliber Talents Project of the Beijing Municipal Institutions(No.CIT&TCD201504039)
文摘The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction.
基金Project(2010-0020163) supported by Key Research Institute Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology, Korea
文摘Similarity measure design for discrete data group was proposed. Similarity measure design for continuous membership function was also carried out. Proposed similarity measures were designed based on fuzzy number and distance measure, and were proved. To calculate the degree of similarity of discrete data, relative degree between data and total distribution was obtained. Discrete data similarity measure was completed with combination of mentioned relative degrees. Power interconnected system with multi characteristics was considered to apply discrete similarity measure. Naturally, similarity measure was extended to multi-dimensional similarity measure case, and applied to bus clustering problem.
文摘This research involved an exploratory evaluation of the dynamics of vehicular traffic on a road network across two traffic light-controlled junctions. The study uses the case study of a one-kilometer road system modelled on Anylogic version 8.8.4. Anylogic is a multi-paradigm simulation tool that supports three main simulation methodologies: discrete event simulation, agent-based modeling, and system dynamics modeling. The system is used to evaluate the implication of stochastic time-based vehicle variables on the general efficiency of road use. Road use efficiency as reflected in this model is based on the percentage of entry vehicles to exit the model within a one-hour simulation period. The study deduced that for the model under review, an increase in entry point time delay has a domineering influence on the efficiency of road use far beyond any other consideration. This study therefore presents a novel approach that leverages Discrete Events Simulation to facilitate efficient road management with a focus on optimum road use efficiency. The study also determined that the inclusion of appropriate random parameters to reflect road use activities at critical event points in a simulation can help in the effective representation of authentic traffic models. The Anylogic simulation software leverages the Classic DEVS and Parallel DEVS formalisms to achieve these objectives.
文摘Objective: To develop a new bioinformatic tool based on a data-mining approach for extraction of the most infor- mative proteins that could be used to find the potential biomarkers for the detection of cancer. Methods: Two independent datasets from serum samples of 253 ovarian cancer and 167 breast cancer patients were used. The samples were examined by surface- enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS). The datasets were used to extract the informative proteins using a data-mining method in the discrete stationary wavelet transform domain. As a dimensionality re- duction procedure, the hard thresholding method was applied to reduce the number of wavelet coefficients. Also, a distance measure was used to select the most discriminative coefficients. To find the potential biomarkers using the selected wavelet coefficients, we applied the inverse discrete stationary wavelet transform combined with a two-sided t-test. Results: From the ovarian cancer dataset, a set of five proteins were detected as potential biomarkers that could be used to identify the cancer patients from the healthy cases with accuracy, sensitivity, and specificity of 100%. Also, from the breast cancer dataset, a set of eight proteins were found as the potential biomarkers that could separate the healthy cases from the cancer patients with accuracy of 98.26%, sensitivity of 100%, and specificity of 95.6%. Conclusion: The results have shown that the new bioinformatic tool can be used in combination with the high-throughput proteomic data such as SELDI-TOF MS to find the potential biomarkers with high discriminative power.
基金Supported by the National Natural Science Foundation of China(No.61300078)the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions(No.CIT&TCD201504039)+1 种基金Funding Project for Academic Human Resources Development in Beijing Union University(No.BPHR2014A03,Rk100201510)"New Start"Academic Research Projects of Beijing Union University(No.Hzk10201501)
文摘Problems existin similarity measurement and index tree construction which affect the performance of nearest neighbor search of high-dimensional data. The equidistance problem is solved using NPsim function to calculate similarity. And a sequential NPsim matrix is built to improve indexing performance. To sum up the above innovations,a nearest neighbor search algorithm of high-dimensional data based on sequential NPsim matrix is proposed in comparison with the nearest neighbor search algorithms based on KD-tree or SR-tree on Munsell spectral data set. Experimental results show that the proposed algorithm similarity is better than that of other algorithms and searching speed is more than thousands times of others. In addition,the slow construction speed of sequential NPsim matrix can be increased by using parallel computing.
文摘Viticulturists traditionally have a keen interest in studying the relationship between the biochemistry of grapevines’ leaves/petioles and their associated spectral reflectance in order to understand the fruit ripening rate, water status, nutrient levels, and disease risk. In this paper, we implement imaging spectroscopy (hyperspectral) reflectance data, for the reflective 330 - 2510 nm wavelength region (986 total spectral bands), to assess vineyard nutrient status;this constitutes a high dimensional dataset with a covariance matrix that is ill-conditioned. The identification of the variables (wavelength bands) that contribute useful information for nutrient assessment and prediction, plays a pivotal role in multivariate statistical modeling. In recent years, researchers have successfully developed many continuous, nearly unbiased, sparse and accurate variable selection methods to overcome this problem. This paper compares four regularized and one functional regression methods: Elastic Net, Multi-Step Adaptive Elastic Net, Minimax Concave Penalty, iterative Sure Independence Screening, and Functional Data Analysis for wavelength variable selection. Thereafter, the predictive performance of these regularized sparse models is enhanced using the stepwise regression. This comparative study of regression methods using a high-dimensional and highly correlated grapevine hyperspectral dataset revealed that the performance of Elastic Net for variable selection yields the best predictive ability.
基金supported by the grants from CASthe National Key R&D Program of Chinathe National Natural Science Foundation of China
文摘Making accurate forecast or prediction is a challenging task in the big data era, in particular for those datasets involving high-dimensional variables but short-term time series points,which are generally available from real-world systems.To address this issue, Prof.
文摘Analytical solutions have varied uses. One is to provide solutions that can be used in verification of numerical methods. Another is to provide relatively simple forms of exact solutions that can be used in estimating parameters, thus, it is possible to reduce computation time in comparison with numerical methods. In this paper, an alternative procedure is presented. Here is used a hybrid solution based on Green's function and real characteristics (discrete data) of the boundary conditions.
文摘Multi-area combined economic/emission dispatch(MACEED)problems are generally studied using analytical functions.However,as the scale of power systems increases,ex isting solutions become time-consuming and may not meet oper ational constraints.To overcome excessive computational ex pense in high-dimensional MACEED problems,a novel data-driven surrogate-assisted method is proposed.First,a cosine-similarity-based deep belief network combined with a back-propagation(DBN+BP)neural network is utilized to replace cost and emission functions.Second,transfer learning is applied with a pretraining and fine-tuning method to improve DBN+BP regression surrogate models,thus realizing fast con struction of surrogate models between different regional power systems.Third,a multi-objective antlion optimizer with a novel general single-dimension retention bi-objective optimization poli cy is proposed to execute MACEED optimization to obtain scheduling decisions.The proposed method not only ensures the convergence,uniformity,and extensibility of the Pareto front,but also greatly reduces the computational time.Finally,a 4-ar ea 40-unit test system with different constraints is employed to demonstrate the effectiveness of the proposed method.
基金supported by TATA Consultancy Servies(TCS)Research Fellowship Program,India
文摘Wi-Fi devices have limited battery life because of which conserving battery life is imperative. The 802.11 Wi-Fi standard provides power management feature that allows stations(STAs) to enter into sleep state to preserve energy without any frame losses. After the STA wakes up, it sends a null data or PS-Poll frame to retrieve frame(s) buffered by the access point(AP), if any during its sleep period. An attacker can launch a power save denial of service(PS-DoS) attack on the sleeping STA(s) by transmitting a spoofed null data or PS-Poll frame(s) to retrieve the buffered frame(s) of the sleeping STA(s) from the AP causing frame losses for the targeted STA(s). Current approaches to prevent or detect the PS-DoS attack require encryption,change in protocol or installation of proprietary hardware. These solutions suffer from expensive setup, maintenance, scalability and deployment issues. The PS-DoS attack does not differ in semantics or statistics under normal and attack circumstances.So signature and anomaly based intrusion detection system(IDS) are unfit to detect the PS-DoS attack. In this paper we propose a timed IDS based on real time discrete event system(RTDES) for detecting PS-DoS attack. The proposed DES based IDS overcomes the drawbacks of existing systems and detects the PS-DoS attack with high accuracy and detection rate. The correctness of the RTDES based IDS is proved by experimenting all possible attack scenarios.
基金supported in part by the National Natural Science Foundation of China (6177249391646114)+1 种基金Chongqing research program of technology innovation and application (cstc2017rgzn-zdyfX0020)in part by the Pioneer Hundred Talents Program of Chinese Academy of Sciences
文摘Latent factor(LF) models are highly effective in extracting useful knowledge from High-Dimensional and Sparse(HiDS) matrices which are commonly seen in various industrial applications. An LF model usually adopts iterative optimizers,which may consume many iterations to achieve a local optima,resulting in considerable time cost. Hence, determining how to accelerate the training process for LF models has become a significant issue. To address this, this work proposes a randomized latent factor(RLF) model. It incorporates the principle of randomized learning techniques from neural networks into the LF analysis of HiDS matrices, thereby greatly alleviating computational burden. It also extends a standard learning process for randomized neural networks in context of LF analysis to make the resulting model represent an HiDS matrix correctly.Experimental results on three HiDS matrices from industrial applications demonstrate that compared with state-of-the-art LF models, RLF is able to achieve significantly higher computational efficiency and comparable prediction accuracy for missing data.I provides an important alternative approach to LF analysis of HiDS matrices, which is especially desired for industrial applications demanding highly efficient models.
基金National Science Council(102-2313-b-275-001),which sponsored this work
文摘The governing factors that influence landslide occurrences are complicated by the different soil conditions at various sites.To resolve the problem,this study focused on spatial information technology to collect data and information on geology.GIS,remote sensing and digital elevation model(DEM) were used in combination to extract the attribute values of the surface material in the vast study area of SheiPa National Park,Taiwan.The factors influencing landslides were collected and quantification values computed.The major soil component of loam and gravel in the Shei-Pa area resulted in different landslide problems.The major factors were successfully extracted from the influencing factors.Finally,the discrete rough set(DRS) classifier was used as a tool to find the threshold of each attribute contributing to landslide occurrence,based upon the knowledge database.This rule-based knowledge database provides an effective and urgent system to manage landslides.NDVI(Normalized Difference Vegetation Index),VI(Vegetation Index),elevation,and distance from the road are the four major influencing factors for landslide occurrence.The landslide hazard potential diagrams(landslide susceptibility maps) were drawn and a rational accuracy rate of landslide was calculated.This study thus offers a systematic solution to the investigation of landslide disasters.
基金supported in part by the National Natural Science Foundation of China (Nos. 61303074, 61309013)the Programs for Science, National Key Basic Research and Development Program ("973") of China (No. 2012CB315900)Technology Development of Henan province (Nos.12210231003, 13210231002)
文摘Aimed at the issue that traditional clustering methods are not appropriate to high-dimensional data, a cuckoo search fuzzy-weighting algorithm for subspace clustering is presented on the basis of the exited soft subspace clustering algorithm. In the proposed algorithm, a novel objective function is firstly designed by considering the fuzzy weighting within-cluster compactness and the between-cluster separation, and loosening the constraints of dimension weight matrix. Then gradual membership and improved Cuckoo search, a global search strategy, are introduced to optimize the objective function and search subspace clusters, giving novel learning rules for clustering. At last, the performance of the proposed algorithm on the clustering analysis of various low and high dimensional datasets is experimentally compared with that of several competitive subspace clustering algorithms. Experimental studies demonstrate that the proposed algorithm can obtain better performance than most of the existing soft subspace clustering algorithms.
基金supported by the Fundamental Scientific Research Business Expenses for Central Universities(3072021CFJ0803)the Advanced Marine Communication and Information Technology Ministry of Industry and Information Technology Key Laboratory Project(AMCIT21V3).
文摘The accuracy of target threat estimation has a great impact on command decision-making.The Bayesian network,as an effective way to deal with the problem of uncertainty,can be used to track the change of the target threat level.Unfortunately,the traditional discrete dynamic Bayesian network(DDBN)has the problems of poor parameter learning and poor reasoning accuracy in a small sample environment with partial prior information missing.Considering the finiteness and discreteness of DDBN parameters,a fuzzy k-nearest neighbor(KNN)algorithm based on correlation of feature quantities(CF-FKNN)is proposed for DDBN parameter learning.Firstly,the correlation between feature quantities is calculated,and then the KNN algorithm with fuzzy weight is introduced to fill the missing data.On this basis,a reasonable DDBN structure is constructed by using expert experience to complete DDBN parameter learning and reasoning.Simulation results show that the CF-FKNN algorithm can accurately fill in the data when the samples are seriously missing,and improve the effect of DDBN parameter learning in the case of serious sample missing.With the proposed method,the final target threat assessment results are reasonable,which meets the needs of engineering applications.
基金Supported by National Natural Science Foundation of P.R.China (60474049)the Natural Science Foundation of Fujian Province of P. R. China (A0410012, A0510009)
文摘For packet-based transmission of data over a network, or temporary sensor failure, etc., data samples may be missing in the measured signals. This paper deals with the problem of H∞ filter design for linear discrete-time systems with missing measurements. The missing measurements will happen at any sample time, and the probability of the occurrence of missing data is assumed to be known. The main purpose is to obtain both full-and reduced-order filters such that the filter error systems are exponentially mean-square stable and guarantee a prescribed H∞ performance in terms of linear matrix inequality (LMI). A numerical example is provided to demonstrate the validity of the proposed design approach.