A time-series similarity measurement method based on wavelet and matrix transform was proposed,and its anti-noise ability,sensitivity and accuracy were discussed. The time-series sequences were compressed into wavelet...A time-series similarity measurement method based on wavelet and matrix transform was proposed,and its anti-noise ability,sensitivity and accuracy were discussed. The time-series sequences were compressed into wavelet subspace,and sample feature vector and orthogonal basics of sample time-series sequences were obtained by K-L transform. Then the inner product transform was carried out to project analyzed time-series sequence into orthogonal basics to gain analyzed feature vectors. The similarity was calculated between sample feature vector and analyzed feature vector by the Euclid distance. Taking fault wave of power electronic devices for example,the experimental results show that the proposed method has low dimension of feature vector,the anti-noise ability of proposed method is 30 times as large as that of plain wavelet method,the sensitivity of proposed method is 1/3 as large as that of plain wavelet method,and the accuracy of proposed method is higher than that of the wavelet singular value decomposition method. The proposed method can be applied in similarity matching and indexing for lager time series databases.展开更多
Software defect prediction is aimed to find potential defects based on historical data and software features. Software features can reflect the characteristics of software modules. However, some of these features may ...Software defect prediction is aimed to find potential defects based on historical data and software features. Software features can reflect the characteristics of software modules. However, some of these features may be more relevant to the class (defective or non-defective), but others may be redundant or irrelevant. To fully measure the correlation between different features and the class, we present a feature selection approach based on a similarity measure (SM) for software defect prediction. First, the feature weights are updated according to the similarity of samples in different classes. Second, a feature ranking list is generated by sorting the feature weights in descending order, and all feature subsets are selected from the feature ranking list in sequence. Finally, all feature subsets are evaluated on a k-nearest neighbor (KNN) model and measured by an area under curve (AUC) metric for classification performance. The experiments are conducted on 11 National Aeronautics and Space Administration (NASA) datasets, and the results show that our approach performs better than or is comparable to the compared feature selection approaches in terms of classification performance.展开更多
Various binary similarity measures have been employed in clustering approaches to make homogeneous groups of similar entities in the data. These similarity measures are mostly based only on the presence or absence of ...Various binary similarity measures have been employed in clustering approaches to make homogeneous groups of similar entities in the data. These similarity measures are mostly based only on the presence or absence of features. Binary similarity measures have also been explored with different clustering approaches (e.g., agglomera- tive hierarchical clustering) for software modularization to make software systems understandable and manageable. Each similarity measure has its own strengths and weaknesses which improve and deteriorate the clustering results, respectively. We highlight the strengths of some well-known existing binary similarity measures for software mod- ularization. Furthermore, based on these existing similarity measures, we introduce several improved new binary similarity measures. Proofs of the correctness with illustration and a series of experiments are presented to evaluate the effectiveness of our new binary similarity measures.展开更多
基金Projects(60634020, 60904077, 60874069) supported by the National Natural Science Foundation of ChinaProject(JC200903180555A) supported by the Foundation Project of Shenzhen City Science and Technology Plan of China
文摘A time-series similarity measurement method based on wavelet and matrix transform was proposed,and its anti-noise ability,sensitivity and accuracy were discussed. The time-series sequences were compressed into wavelet subspace,and sample feature vector and orthogonal basics of sample time-series sequences were obtained by K-L transform. Then the inner product transform was carried out to project analyzed time-series sequence into orthogonal basics to gain analyzed feature vectors. The similarity was calculated between sample feature vector and analyzed feature vector by the Euclid distance. Taking fault wave of power electronic devices for example,the experimental results show that the proposed method has low dimension of feature vector,the anti-noise ability of proposed method is 30 times as large as that of plain wavelet method,the sensitivity of proposed method is 1/3 as large as that of plain wavelet method,and the accuracy of proposed method is higher than that of the wavelet singular value decomposition method. The proposed method can be applied in similarity matching and indexing for lager time series databases.
基金Project supported by the National Natural Science Foundation of China (Nos. 61673384 and 61502497), the Guangxi Key Laboratory of Trusted Software (No. kx201530), the China Postdoctoral Science Foundation (No. 2015M581887), and the Scientific Research Innovation Project for Graduate Students of Jiangsu Province, China (No. KYLX15 1443)
文摘Software defect prediction is aimed to find potential defects based on historical data and software features. Software features can reflect the characteristics of software modules. However, some of these features may be more relevant to the class (defective or non-defective), but others may be redundant or irrelevant. To fully measure the correlation between different features and the class, we present a feature selection approach based on a similarity measure (SM) for software defect prediction. First, the feature weights are updated according to the similarity of samples in different classes. Second, a feature ranking list is generated by sorting the feature weights in descending order, and all feature subsets are selected from the feature ranking list in sequence. Finally, all feature subsets are evaluated on a k-nearest neighbor (KNN) model and measured by an area under curve (AUC) metric for classification performance. The experiments are conducted on 11 National Aeronautics and Space Administration (NASA) datasets, and the results show that our approach performs better than or is comparable to the compared feature selection approaches in terms of classification performance.
基金supported by the Office of Research,Innovation,Commercialization and Consultancy(ORICC)Universiti Tun Hussein Onn Malaysia(UTHM),Malaysia(No.U063)
文摘Various binary similarity measures have been employed in clustering approaches to make homogeneous groups of similar entities in the data. These similarity measures are mostly based only on the presence or absence of features. Binary similarity measures have also been explored with different clustering approaches (e.g., agglomera- tive hierarchical clustering) for software modularization to make software systems understandable and manageable. Each similarity measure has its own strengths and weaknesses which improve and deteriorate the clustering results, respectively. We highlight the strengths of some well-known existing binary similarity measures for software mod- ularization. Furthermore, based on these existing similarity measures, we introduce several improved new binary similarity measures. Proofs of the correctness with illustration and a series of experiments are presented to evaluate the effectiveness of our new binary similarity measures.