Text categorization(TC)is one of the widely studied branches of text mining and has many applications in different domains.It tries to automatically assign a text document to one of the predefined categories often by ...Text categorization(TC)is one of the widely studied branches of text mining and has many applications in different domains.It tries to automatically assign a text document to one of the predefined categories often by using machine learning(ML)techniques.Choosing the best classifier in this task is the most important step in which k-Nearest Neighbor(KNN)is widely employed as a classifier as well as several other well-known ones such as Support Vector Machine,Multinomial Naive Bayes,Logistic Regression,and so on.The KNN has been extensively used for TC tasks and is one of the oldest and simplest methods for pattern classification.Its performance crucially relies on the distance metric used to identify nearest neighbors such that the most frequently observed label among these neighbors is used to classify an unseen test instance.Hence,in this paper,a comparative analysis of the KNN classifier is performed on a subset(i.e.,R8)of the Reuters-21578 benchmark dataset for TC.Experimental results are obtained by using different distance metrics as well as recently proposed distance learning metrics under different cases where the feature model and term weighting scheme are different.Our comparative evaluation of the results shows that Bray-Curtis and Linear Discriminant Analysis(LDA)are often superior to the other metrics and work well with raw term frequency weights.展开更多
Transient stability assessment(TSA) is of great importance in power systems. For a given contingency, one of the most widely-used transient stability indices is the critical clearing time(CCT), which is a function of ...Transient stability assessment(TSA) is of great importance in power systems. For a given contingency, one of the most widely-used transient stability indices is the critical clearing time(CCT), which is a function of the pre-fault power flow.TSA can be regarded as the fitting of this function with the prefault power flow as the input and the CCT as the output. In this paper, a data-driven TSA model is proposed to estimate the CCT. The model is based on Mahalanobis-kernel regression,which employs the Mahalanobis distance in the kernel regression method to formulate a better regressor. A distance metric learning approach is developed to determine the problem-specific distance for TSA, which describes the dissimilarity between two power flow scenarios. The proposed model is more accurate compared to other data-driven methods, and its accuracy can be further improved by supplementing more training samples.Moreover, the model provides the probability density function of the CCT, and different estimations of CCT at different conservativeness levels. Test results verify the validity and the merits of the method.展开更多
Most existing semi-supervised clustering algorithms are not designed for handling high- dimensional data. On the other hand, semi-supervised dimensionality reduction methods may not necessarily improve the clustering ...Most existing semi-supervised clustering algorithms are not designed for handling high- dimensional data. On the other hand, semi-supervised dimensionality reduction methods may not necessarily improve the clustering performance, due to the fact that the inherent relationship between subspace selection and clustering is ignored. In order to mitigate the above problems, we present a semi-supervised clustering algo- rithm using adaptive distance metric learning (SCADM) which performs semi-supervised clustering and distance metric learning simultaneously. SCADM applies the clustering results to learn a distance metric and then projects the data onto a low-dimensional space where the separability of the data is maximized. Experimental results on real-world data sets show that the proposed method can effectively deal with high-dimensional data and provides an appealing clustering performance.展开更多
Learning unlabeled data is a significant challenge that needs to han-dle complicated relationships between nominal values and attributes.Increas-ingly,recent research on learning value relations within and between att...Learning unlabeled data is a significant challenge that needs to han-dle complicated relationships between nominal values and attributes.Increas-ingly,recent research on learning value relations within and between attributes has shown significant improvement in clustering and outlier detection,etc.However,typical existing work relies on learning pairwise value relations but weakens or overlooks the direct couplings between multiple attributes.This paper thus proposes two novel and flexible multi-attribute couplings-based distance(MCD)metrics,which learn the multi-attribute couplings and their strengths in nominal data based on information theories:self-information,entropy,and mutual information,for measuring both numerical and nominal distances.MCD enables the application of numerical and nominal clustering methods on nominal data and quantifies the influence of involving and filtering multi-attribute couplings on distance learning and clustering perfor-mance.Substantial experiments evidence the above conclusions on 15 data sets against seven state-of-the-art distance measures with various feature selection methods for both numerical and nominal clustering.展开更多
A method for ranking complementary judgment matrixes with traspezoidal fuzzy numbers based on Hausdorff metric distance and fuzzy compromise decision approach is proposed. With regard to fuzzy number complementary jud...A method for ranking complementary judgment matrixes with traspezoidal fuzzy numbers based on Hausdorff metric distance and fuzzy compromise decision approach is proposed. With regard to fuzzy number complementary judgment matrixes given by a decider group whose members have various weights, the expert's information was aggregated first by means of simple weight average(SWA) method and Bonissone calculational method. Hence a matrix including all the experts' preference information was got. Then the matrix' column members were added up and the fuzzy evaluation values of the alternatives were got. Lastly, the Hausdorff metric distance and fuzzy compromise decision approach were used to rank the fuzzy evaluation values and then the ranking values of all the alternatives were got. Because exact numbers and triangular fuzzy numbers could all be transformed into trapezoidal fuzzy numbers, the method developed can rank complementary judgment matrixes with trapezoidal fuzzy numbers, triangular fuzzy numbers and exact numbers as well. An illustrative example is also given to verify the developed method and to demonstrate its feasibility and practicality.展开更多
In a vehicular ad hoc network(VANET),a massive quantity of data needs to be transmitted on a large scale in shorter time durations.At the same time,vehicles exhibit high velocity,leading to more vehicle disconnections...In a vehicular ad hoc network(VANET),a massive quantity of data needs to be transmitted on a large scale in shorter time durations.At the same time,vehicles exhibit high velocity,leading to more vehicle disconnections.Both of these characteristics result in unreliable data communication in VANET.A vehicle clustering algorithm clusters the vehicles in groups employed in VANET to enhance network scalability and connection reliability.Clustering is considered one of the possible solutions for attaining effectual interaction in VANETs.But one such difficulty was reducing the cluster number under increasing transmitting nodes.This article introduces an Evolutionary Hide Objects Game Optimization based Distance Aware Clustering(EHOGO-DAC)Scheme for VANET.The major intention of the EHOGO-DAC technique is to portion the VANET into distinct sets of clusters by grouping vehicles.In addition,the DHOGO-EAC technique is mainly based on the HOGO algorithm,which is stimulated by old games,and the searching agent tries to identify hidden objects in a given space.The DHOGO-EAC technique derives a fitness function for the clustering process,including the total number of clusters and Euclidean distance.The experimental assessment of the DHOGO-EAC technique was carried out under distinct aspects.The comparison outcome stated the enhanced outcomes of the DHOGO-EAC technique compared to recent approaches.展开更多
The appearance of pedestrians can vary greatly from image to image,and different pedestrians may look similar in a given image.Such similarities and variabilities in the appearance and clothing of individuals make the...The appearance of pedestrians can vary greatly from image to image,and different pedestrians may look similar in a given image.Such similarities and variabilities in the appearance and clothing of individuals make the task of pedestrian re-identification very challenging.Here,a pedestrian re-identification method based on the fusion of local features and gait energy image(GEI)features is proposed.In this method,the human body is divided into four regions according to joint points.The color and texture of each region of the human body are extracted as local features,and GEI features of the pedestrian gait are also obtained.These features are then fused with the local and GEI features of the person.Independent distance measure learning using the cross-view quadratic discriminant analysis(XQDA)method is used to obtain the similarity of the metric function of the image pairs,and the final similarity is acquired by weight matching.Evaluation of experimental results by cumulative matching characteristic(CMC)curves reveals that,after fusion of local and GEI features,the pedestrian re-identification effect is improved compared with existing methods and is notably better than the recognition rate of pedestrian re-identification with a single feature.展开更多
The continuous emergence of new targets in open scenarios leads to a substantial decrease in the performance of Inverse Synthetic Aperture Radar(ISAR)recognition systems.Also,data scarcity further exacerbates the chal...The continuous emergence of new targets in open scenarios leads to a substantial decrease in the performance of Inverse Synthetic Aperture Radar(ISAR)recognition systems.Also,data scarcity further exacerbates the challenge of identifying new classes of ISAR targets.In this paper,a few-shot incremental target recognition framework based on Scattering-Topology Properties(STPIL)is proposed.Specifically,STPIL extracts scattering-topology properties of ISAR targets as recognition features.Meanwhile,the pseudo-incremental training strategy effectively alleviates the algorithm’s forgetting of old knowledge,and improves compatibility with new classes.Besides,a feature embedding network,with few parameters,is designed based on the graph neural network.This embedding network is highly adaptable to changes in data distribution.Additionally,STPIL fully considers the joint distribution and marginal distribution in scattering features,and uses the Brownian distance metric module to make the scattering-topology features more discriminative.Experimental results on both the simulation dataset and the public measured data indicate that STPIL can effectively balance new classes with old classes,and has superior performance to other advanced methods in the incremental recognition of targets.展开更多
The security threats to software-defined networks(SDNs)have become a significant problem,generally because of the open framework of SDNs.Among all the threats,distributed denial-of-service(DDoS)attacks can have a deva...The security threats to software-defined networks(SDNs)have become a significant problem,generally because of the open framework of SDNs.Among all the threats,distributed denial-of-service(DDoS)attacks can have a devastating impact on the network.We propose a method to discover DDoS attack behaviors in SDNs using a feature-pattern graph model.The feature-pattern graph model presented employs network patterns as nodes and similarity as weighted links;it can demonstrate not only the traffc header information but also the relationships among all the network patterns.The similarity between nodes is modeled by metric learning and the Mahalanobis distance.The proposed method can discover DDoS attacks using a graph-based neighborhood classification method;it is capable of automatically finding unknown attacks and is scalable by inserting new nodes to the graph model via local or global updates.Experiments on two datasets prove the feasibility of the proposed method for attack behavior discovery and graph update tasks,and demonstrate that the graph-based method to discover DDoS attack behaviors substantially outperforms the methods compared herein.展开更多
In this paper,we consider unusual event detection problem in a novel viewpoint and provide an algorithm to solve the problem.The actions or events in the scene is usual or not will eventually be reflected on the chang...In this paper,we consider unusual event detection problem in a novel viewpoint and provide an algorithm to solve the problem.The actions or events in the scene is usual or not will eventually be reflected on the changes of some basic features.We summarize these basic event features and propose special representation for each of them.Thus we can model these features in a uniform mode using adaptive Gaussian mixture model.Supervised and unsupervised unusual event detection algorithm can be designed to fit various situations based on this model.The superiority of our model is that it can detect unusual event automatically without to know the determinate model of unusual events.In conclusion,we provide two applications to verify the effectiveness of our model.展开更多
The authors establish weighted L^2-estimates of solutions for the damped wave equations with variable coefficients utt-div A(x)▽u + au_t = 0 in IR^nunder the assumption a(x) ≥ a_0[1 + ρ(x)]^(-l),where a_0 > 0, l...The authors establish weighted L^2-estimates of solutions for the damped wave equations with variable coefficients utt-div A(x)▽u + au_t = 0 in IR^nunder the assumption a(x) ≥ a_0[1 + ρ(x)]^(-l),where a_0 > 0, l < 1, ρ(x) is the distance function of the metric g = A^(-1)(x) on IR^n. The authors show that these weighted L^2-estimates are closely related to the geometrical properties of the metric g = A^(-1)(x).展开更多
文摘Text categorization(TC)is one of the widely studied branches of text mining and has many applications in different domains.It tries to automatically assign a text document to one of the predefined categories often by using machine learning(ML)techniques.Choosing the best classifier in this task is the most important step in which k-Nearest Neighbor(KNN)is widely employed as a classifier as well as several other well-known ones such as Support Vector Machine,Multinomial Naive Bayes,Logistic Regression,and so on.The KNN has been extensively used for TC tasks and is one of the oldest and simplest methods for pattern classification.Its performance crucially relies on the distance metric used to identify nearest neighbors such that the most frequently observed label among these neighbors is used to classify an unseen test instance.Hence,in this paper,a comparative analysis of the KNN classifier is performed on a subset(i.e.,R8)of the Reuters-21578 benchmark dataset for TC.Experimental results are obtained by using different distance metrics as well as recently proposed distance learning metrics under different cases where the feature model and term weighting scheme are different.Our comparative evaluation of the results shows that Bray-Curtis and Linear Discriminant Analysis(LDA)are often superior to the other metrics and work well with raw term frequency weights.
基金supported by National Key R&D Program of China (No.2018YFB0904500)State Grid Corporation of China。
文摘Transient stability assessment(TSA) is of great importance in power systems. For a given contingency, one of the most widely-used transient stability indices is the critical clearing time(CCT), which is a function of the pre-fault power flow.TSA can be regarded as the fitting of this function with the prefault power flow as the input and the CCT as the output. In this paper, a data-driven TSA model is proposed to estimate the CCT. The model is based on Mahalanobis-kernel regression,which employs the Mahalanobis distance in the kernel regression method to formulate a better regressor. A distance metric learning approach is developed to determine the problem-specific distance for TSA, which describes the dissimilarity between two power flow scenarios. The proposed model is more accurate compared to other data-driven methods, and its accuracy can be further improved by supplementing more training samples.Moreover, the model provides the probability density function of the CCT, and different estimations of CCT at different conservativeness levels. Test results verify the validity and the merits of the method.
文摘Most existing semi-supervised clustering algorithms are not designed for handling high- dimensional data. On the other hand, semi-supervised dimensionality reduction methods may not necessarily improve the clustering performance, due to the fact that the inherent relationship between subspace selection and clustering is ignored. In order to mitigate the above problems, we present a semi-supervised clustering algo- rithm using adaptive distance metric learning (SCADM) which performs semi-supervised clustering and distance metric learning simultaneously. SCADM applies the clustering results to learn a distance metric and then projects the data onto a low-dimensional space where the separability of the data is maximized. Experimental results on real-world data sets show that the proposed method can effectively deal with high-dimensional data and provides an appealing clustering performance.
基金funded by the MOE(Ministry of Education in China)Project of Humanities and Social Sciences(Project Number:18YJC870006)from China.
文摘Learning unlabeled data is a significant challenge that needs to han-dle complicated relationships between nominal values and attributes.Increas-ingly,recent research on learning value relations within and between attributes has shown significant improvement in clustering and outlier detection,etc.However,typical existing work relies on learning pairwise value relations but weakens or overlooks the direct couplings between multiple attributes.This paper thus proposes two novel and flexible multi-attribute couplings-based distance(MCD)metrics,which learn the multi-attribute couplings and their strengths in nominal data based on information theories:self-information,entropy,and mutual information,for measuring both numerical and nominal distances.MCD enables the application of numerical and nominal clustering methods on nominal data and quantifies the influence of involving and filtering multi-attribute couplings on distance learning and clustering perfor-mance.Substantial experiments evidence the above conclusions on 15 data sets against seven state-of-the-art distance measures with various feature selection methods for both numerical and nominal clustering.
文摘A method for ranking complementary judgment matrixes with traspezoidal fuzzy numbers based on Hausdorff metric distance and fuzzy compromise decision approach is proposed. With regard to fuzzy number complementary judgment matrixes given by a decider group whose members have various weights, the expert's information was aggregated first by means of simple weight average(SWA) method and Bonissone calculational method. Hence a matrix including all the experts' preference information was got. Then the matrix' column members were added up and the fuzzy evaluation values of the alternatives were got. Lastly, the Hausdorff metric distance and fuzzy compromise decision approach were used to rank the fuzzy evaluation values and then the ranking values of all the alternatives were got. Because exact numbers and triangular fuzzy numbers could all be transformed into trapezoidal fuzzy numbers, the method developed can rank complementary judgment matrixes with trapezoidal fuzzy numbers, triangular fuzzy numbers and exact numbers as well. An illustrative example is also given to verify the developed method and to demonstrate its feasibility and practicality.
基金This work was supported by the Ulsan City&Electronics and Telecommunications Research Institute(ETRI)grant funded by the Ulsan City[22AS1600,the development of intelligentization technology for the main industry for manufacturing innovation and Human-mobile-space autonomous collaboration intelligence technology development in industrial sites].
文摘In a vehicular ad hoc network(VANET),a massive quantity of data needs to be transmitted on a large scale in shorter time durations.At the same time,vehicles exhibit high velocity,leading to more vehicle disconnections.Both of these characteristics result in unreliable data communication in VANET.A vehicle clustering algorithm clusters the vehicles in groups employed in VANET to enhance network scalability and connection reliability.Clustering is considered one of the possible solutions for attaining effectual interaction in VANETs.But one such difficulty was reducing the cluster number under increasing transmitting nodes.This article introduces an Evolutionary Hide Objects Game Optimization based Distance Aware Clustering(EHOGO-DAC)Scheme for VANET.The major intention of the EHOGO-DAC technique is to portion the VANET into distinct sets of clusters by grouping vehicles.In addition,the DHOGO-EAC technique is mainly based on the HOGO algorithm,which is stimulated by old games,and the searching agent tries to identify hidden objects in a given space.The DHOGO-EAC technique derives a fitness function for the clustering process,including the total number of clusters and Euclidean distance.The experimental assessment of the DHOGO-EAC technique was carried out under distinct aspects.The comparison outcome stated the enhanced outcomes of the DHOGO-EAC technique compared to recent approaches.
基金This research was funded by the Science and Technology Support Plan Project of Hebei Province(grant numbers 17210803D and 19273703D)the Science and Technology Spark Project of the Hebei Seismological Bureau(grant number DZ20180402056)+1 种基金the Education Department of Hebei Province(grant number QN2018095)the Polytechnic College of Hebei University of Science and Technology.
文摘The appearance of pedestrians can vary greatly from image to image,and different pedestrians may look similar in a given image.Such similarities and variabilities in the appearance and clothing of individuals make the task of pedestrian re-identification very challenging.Here,a pedestrian re-identification method based on the fusion of local features and gait energy image(GEI)features is proposed.In this method,the human body is divided into four regions according to joint points.The color and texture of each region of the human body are extracted as local features,and GEI features of the pedestrian gait are also obtained.These features are then fused with the local and GEI features of the person.Independent distance measure learning using the cross-view quadratic discriminant analysis(XQDA)method is used to obtain the similarity of the metric function of the image pairs,and the final similarity is acquired by weight matching.Evaluation of experimental results by cumulative matching characteristic(CMC)curves reveals that,after fusion of local and GEI features,the pedestrian re-identification effect is improved compared with existing methods and is notably better than the recognition rate of pedestrian re-identification with a single feature.
文摘The continuous emergence of new targets in open scenarios leads to a substantial decrease in the performance of Inverse Synthetic Aperture Radar(ISAR)recognition systems.Also,data scarcity further exacerbates the challenge of identifying new classes of ISAR targets.In this paper,a few-shot incremental target recognition framework based on Scattering-Topology Properties(STPIL)is proposed.Specifically,STPIL extracts scattering-topology properties of ISAR targets as recognition features.Meanwhile,the pseudo-incremental training strategy effectively alleviates the algorithm’s forgetting of old knowledge,and improves compatibility with new classes.Besides,a feature embedding network,with few parameters,is designed based on the graph neural network.This embedding network is highly adaptable to changes in data distribution.Additionally,STPIL fully considers the joint distribution and marginal distribution in scattering features,and uses the Brownian distance metric module to make the scattering-topology features more discriminative.Experimental results on both the simulation dataset and the public measured data indicate that STPIL can effectively balance new classes with old classes,and has superior performance to other advanced methods in the incremental recognition of targets.
基金project supported by the National Key R&D Program of China(Nos.2017YFB0802300 and 2017YFC0803700)
文摘The security threats to software-defined networks(SDNs)have become a significant problem,generally because of the open framework of SDNs.Among all the threats,distributed denial-of-service(DDoS)attacks can have a devastating impact on the network.We propose a method to discover DDoS attack behaviors in SDNs using a feature-pattern graph model.The feature-pattern graph model presented employs network patterns as nodes and similarity as weighted links;it can demonstrate not only the traffc header information but also the relationships among all the network patterns.The similarity between nodes is modeled by metric learning and the Mahalanobis distance.The proposed method can discover DDoS attacks using a graph-based neighborhood classification method;it is capable of automatically finding unknown attacks and is scalable by inserting new nodes to the graph model via local or global updates.Experiments on two datasets prove the feasibility of the proposed method for attack behavior discovery and graph update tasks,and demonstrate that the graph-based method to discover DDoS attack behaviors substantially outperforms the methods compared herein.
基金the National Natural Science Foundation of China (No. 60805001)
文摘In this paper,we consider unusual event detection problem in a novel viewpoint and provide an algorithm to solve the problem.The actions or events in the scene is usual or not will eventually be reflected on the changes of some basic features.We summarize these basic event features and propose special representation for each of them.Thus we can model these features in a uniform mode using adaptive Gaussian mixture model.Supervised and unsupervised unusual event detection algorithm can be designed to fit various situations based on this model.The superiority of our model is that it can detect unusual event automatically without to know the determinate model of unusual events.In conclusion,we provide two applications to verify the effectiveness of our model.
基金supported by the National Science Foundation of China under Grant Nos.61573342,61473126the Key Research Program of Frontier Sciences,Chinese Academy of Sciences,under Grant No.QYZDJ-SSWSYS011 the Fundamental Research Funds for the Central Universities
文摘The authors establish weighted L^2-estimates of solutions for the damped wave equations with variable coefficients utt-div A(x)▽u + au_t = 0 in IR^nunder the assumption a(x) ≥ a_0[1 + ρ(x)]^(-l),where a_0 > 0, l < 1, ρ(x) is the distance function of the metric g = A^(-1)(x) on IR^n. The authors show that these weighted L^2-estimates are closely related to the geometrical properties of the metric g = A^(-1)(x).