To make the modulation classification system more suitable for signals in a wide range of signal to noise rate (SNR), a feature extraction method based on signal wavelet packet transform modulus maxima matrix (WPT...To make the modulation classification system more suitable for signals in a wide range of signal to noise rate (SNR), a feature extraction method based on signal wavelet packet transform modulus maxima matrix (WPTMMM) and a novel support vector machine fuzzy network (SVMFN) classifier is presented. The WPTMMM feature extraction method has less computational complexity, more stability, and has the preferable advantage of robust with the time parallel moving and white noise. Further, the SVMFN uses a new definition of fuzzy density that incorporates accuracy and uncertainty of the classifiers to improve recognition reliability to classify nine digital modulation types (i.e. 2ASK, 2FSK, 2PSK, 4ASK, 4FSK, 4PSK, 16QAM, MSK, and OQPSK). Computer simulation shows that the proposed scheme has the advantages of high accuracy and reliability (success rates are over 98% when SNR is not lower than 0dB), and it adapts to engineering applications.展开更多
Successful prediction of protein domain boundaries provides valuable information not only for the computational structure prediction of muhi-domain proteins but also for the experimental structure determination. A nov...Successful prediction of protein domain boundaries provides valuable information not only for the computational structure prediction of muhi-domain proteins but also for the experimental structure determination. A novel method for domain boundary prediction has been presented, which combines the support vector machine with domain guess by size algorithm. Since the evolutional information of multiple domains can be detected by position specific score matrix, the support vector machine method is trained and tested using the values of position specific score matrix generated by PSI-BLAST. The candidate domain boundaries are selected from the output of support vector machine, and are then inputted to domain guess by size algorithm to give the final results of domain boundary, prediction. The experimental results show that the combined method outperforms the individual method of both support vector machine and domain guess by size.展开更多
Kernel-based methods work by embedding the data into a feature space and then searching linear hypothesis among the embedding data points. The performance is mostly affected by which kernel is used. A promising way is...Kernel-based methods work by embedding the data into a feature space and then searching linear hypothesis among the embedding data points. The performance is mostly affected by which kernel is used. A promising way is to learn the kernel from the data automatically. A general regularized risk functional (RRF) criterion for kernel matrix learning is proposed. Compared with the RRF criterion, general RRF criterion takes into account the geometric distributions of the embedding data points. It is proven that the distance between different geometric distdbutions can be estimated by their centroid distance in the reproducing kernel Hilbert space. Using this criterion for kernel matrix learning leads to a convex quadratically constrained quadratic programming (QCQP) problem. For several commonly used loss functions, their mathematical formulations are given. Experiment results on a collection of benchmark data sets demonstrate the effectiveness of the proposed method.展开更多
In recent years,binary image steganography has developed so rapidly that the research of binary image steganalysis becomes more important for information security.In most state-of-the-art binary image steganographic s...In recent years,binary image steganography has developed so rapidly that the research of binary image steganalysis becomes more important for information security.In most state-of-the-art binary image steganographic schemes,they always find out the flippable pixels to minimize the embedding distortions.For this reason,the stego images generated by the previous schemes maintain visual quality and it is hard for steganalyzer to capture the embedding trace in spacial domain.However,the distortion maps can be calculated for cover and stego images and the difference between them is significant.In this paper,a novel binary image steganalytic scheme is proposed,which is based on distortion level co-occurrence matrix.The proposed scheme first generates the corresponding distortion maps for cover and stego images.Then the co-occurrence matrix is constructed on the distortion level maps to represent the features of cover and stego images.Finally,support vector machine,based on the gaussian kernel,is used to classify the features.Compared with the prior steganalytic methods,experimental results demonstrate that the proposed scheme can effectively detect stego images.展开更多
The knowledge of subnuclear localization in eukaryotic cells is indispensable for under-standing the biological function of nucleus, genome regulation and drug discovery. In this study, a new feature representation wa...The knowledge of subnuclear localization in eukaryotic cells is indispensable for under-standing the biological function of nucleus, genome regulation and drug discovery. In this study, a new feature representation was pro-posed by combining position specific scoring matrix (PSSM) and auto covariance (AC). The AC variables describe the neighboring effect between two amino acids, so that they incorpo-rate the sequence-order information;PSSM de-scribes the information of biological evolution of proteins. Based on this new descriptor, a support vector machine (SVM) classifier was built to predict subnuclear localization. To evaluate the power of our predictor, the benchmark dataset that contains 714 proteins localized in nine subnuclear compartments was utilized. The total jackknife cross validation ac-curacy of our method is 76.5%, that is higher than those of the Nuc-PLoc (67.4%), the OET- KNN (55.6%), AAC based SVM (48.9%) and ProtLoc (36.6%). The prediction software used in this article and the details of the SVM parameters are freely available at http://chemlab.scu.edu.cn/ predict_SubNL/index.htm and the dataset used in our study is from Shen and Chou’s work by downloading at http://chou.med.harvard.edu/ bioinf/Nuc-PLoc/Data.htm.展开更多
为了提高小麦麦粒识别的识别率,采用了拉普拉斯卷积网络(Convolution Network Based on Laplacian Eigenmap,LENet)和支持矩阵机(Support Matrix Machines,SMM)分类器相结合的方法对小麦麦粒进行识别。拉普拉斯卷积网络是一种无反馈的...为了提高小麦麦粒识别的识别率,采用了拉普拉斯卷积网络(Convolution Network Based on Laplacian Eigenmap,LENet)和支持矩阵机(Support Matrix Machines,SMM)分类器相结合的方法对小麦麦粒进行识别。拉普拉斯卷积网络是一种无反馈的轻量型级联卷积神经网络,可以用来提取小麦麦粒的特征,该网络通过拉普拉斯特征映射来学习网络的参数,输出层通过块直方图编码和矩阵化处理实现,最终提取的特征使用SMM分类器进行分类。通过在建立的小麦麦粒图像数据库上的实验表明,该麦粒识别方法要优于一些传统特征提取分类方法,取得了较好的识别效果。展开更多
Identification of the drug-binding residues on the surface of proteins is a vital step in drug discovery and it is important for understanding protein function. Most previous researches are based on the structural inf...Identification of the drug-binding residues on the surface of proteins is a vital step in drug discovery and it is important for understanding protein function. Most previous researches are based on the structural information of proteins, but the structures of most proteins are not available. So in this article, a sequence-based method was proposed by combining the support vector machine (SVM)-based ensemble learning and the improved position specific scoring matrix (PSSM). In order to take the local environment information of a drug-binding site into account, an improved PSSM profile scaled by the sliding window and smoothing window was used to improve the prediction result. In addition, a new SVM-based ensemble learning method was developed to deal with the imbalanced data classification problem that commonly exists in the binding site predictions. When performed on the dataset of 985 drug-binding residues, the method achieved a very promising prediction result with the area under the curve (AUC) of 0.9264. Furthermore, an independent dataset of 349 drug- binding residues was used to evaluate the pre- diction model and the prediction accuracy is 84.68%. These results suggest that our method is effective for predicting the drug-binding sites in proteins. The code and all datasets used in this article are freely available at http://cic.scu.edu.cn/bioinformatics/Ensem_DBS.zip.展开更多
Driven by the challenge of integrating large amount of experimental data, classification technique emerges as one of the major and popular tools in computational biology and bioinformatics research. Machine learning m...Driven by the challenge of integrating large amount of experimental data, classification technique emerges as one of the major and popular tools in computational biology and bioinformatics research. Machine learning methods, especially kernel methods with Support Vector Machines (SVMs) are very popular and effective tools. In the perspective of kernel matrix, a technique namely Eigen- matrix translation has been introduced for protein data classification. The Eigen-matrix translation strategy has a lot of nice properties which deserve more exploration. This paper investigates the major role of Eigen-matrix translation in classification. The authors propose that its importance lies in the dimension reduction of predictor attributes within the data set. This is very important when the dimension of features is huge. The authors show by numerical experiments on real biological data sets that the proposed framework is crucial and effective in improving classification accuracy. This can therefore serve as a novel perspective for future research in dimension reduction problems.展开更多
Impacted craters are commonly found on the surface of planets, satellites, asteroids and other solar system bodies. In order to speed up the rate of constructing the database of craters, it is important to develop cra...Impacted craters are commonly found on the surface of planets, satellites, asteroids and other solar system bodies. In order to speed up the rate of constructing the database of craters, it is important to develop crater detection algorithms. This paper presents a novel approach to automatically detect craters on planetary surfaces. The approach contains two parts: crater candidate region selection and crater detection. In the first part, crater candidate region selection is achieved by Kanade-Lucas-Tomasi (KLT) detector. Matrix-pattern-oriented least squares support vector machine (MatLSSVM), as the matrixization version of least square support vector machine (SVM), inherits the advantages of least squares support vector machine (LSSVM), reduces storage space greatly and reserves spatial redundancies within each image matrix compared with general LSSVM. The second part of the approach employs MatLSSVM to design classifier for crater detection. Experimental results on the dataset which comprises 160 preprocessed image patches from Google Mars demonstrate that the accuracy rate of crater detection can be up to 88%. In addition, the outstanding feature of the approach introduced in this paper is that it takes resized crater candidate region as input pattern directly to finish crater detection. The results of the last experiment demonstrate that MatLSSVM-based classifier can detect crater regions effectively on the basis of KLT-based crater candidate region selection.展开更多
文摘To make the modulation classification system more suitable for signals in a wide range of signal to noise rate (SNR), a feature extraction method based on signal wavelet packet transform modulus maxima matrix (WPTMMM) and a novel support vector machine fuzzy network (SVMFN) classifier is presented. The WPTMMM feature extraction method has less computational complexity, more stability, and has the preferable advantage of robust with the time parallel moving and white noise. Further, the SVMFN uses a new definition of fuzzy density that incorporates accuracy and uncertainty of the classifiers to improve recognition reliability to classify nine digital modulation types (i.e. 2ASK, 2FSK, 2PSK, 4ASK, 4FSK, 4PSK, 16QAM, MSK, and OQPSK). Computer simulation shows that the proposed scheme has the advantages of high accuracy and reliability (success rates are over 98% when SNR is not lower than 0dB), and it adapts to engineering applications.
基金Supported by the National Natural Science Foundation of China (No. 60435020)
文摘Successful prediction of protein domain boundaries provides valuable information not only for the computational structure prediction of muhi-domain proteins but also for the experimental structure determination. A novel method for domain boundary prediction has been presented, which combines the support vector machine with domain guess by size algorithm. Since the evolutional information of multiple domains can be detected by position specific score matrix, the support vector machine method is trained and tested using the values of position specific score matrix generated by PSI-BLAST. The candidate domain boundaries are selected from the output of support vector machine, and are then inputted to domain guess by size algorithm to give the final results of domain boundary, prediction. The experimental results show that the combined method outperforms the individual method of both support vector machine and domain guess by size.
基金supported by the National Natural Science Fundation of China (60736021)the Joint Funds of NSFC-Guangdong Province(U0735003)
文摘Kernel-based methods work by embedding the data into a feature space and then searching linear hypothesis among the embedding data points. The performance is mostly affected by which kernel is used. A promising way is to learn the kernel from the data automatically. A general regularized risk functional (RRF) criterion for kernel matrix learning is proposed. Compared with the RRF criterion, general RRF criterion takes into account the geometric distributions of the embedding data points. It is proven that the distance between different geometric distdbutions can be estimated by their centroid distance in the reproducing kernel Hilbert space. Using this criterion for kernel matrix learning leads to a convex quadratically constrained quadratic programming (QCQP) problem. For several commonly used loss functions, their mathematical formulations are given. Experiment results on a collection of benchmark data sets demonstrate the effectiveness of the proposed method.
基金This work is supported by the National Natural Science Foundation of China(No.U1736118)the Natural Science Foundation of Guangdong(No.2016A030313350)+3 种基金the Special Funds for Science and Technology Development of Guangdong(No.2016KZ010103)the Key Project of Scientific Research Plan of Guangzhou(No.201804020068)the Fundamental Research Funds for the Central Universities(No.16lgjc83 and No.17lgjc45)the Science and Technology Planning Project of Guangdong Province(Grant No.2017A040405051).
文摘In recent years,binary image steganography has developed so rapidly that the research of binary image steganalysis becomes more important for information security.In most state-of-the-art binary image steganographic schemes,they always find out the flippable pixels to minimize the embedding distortions.For this reason,the stego images generated by the previous schemes maintain visual quality and it is hard for steganalyzer to capture the embedding trace in spacial domain.However,the distortion maps can be calculated for cover and stego images and the difference between them is significant.In this paper,a novel binary image steganalytic scheme is proposed,which is based on distortion level co-occurrence matrix.The proposed scheme first generates the corresponding distortion maps for cover and stego images.Then the co-occurrence matrix is constructed on the distortion level maps to represent the features of cover and stego images.Finally,support vector machine,based on the gaussian kernel,is used to classify the features.Compared with the prior steganalytic methods,experimental results demonstrate that the proposed scheme can effectively detect stego images.
文摘The knowledge of subnuclear localization in eukaryotic cells is indispensable for under-standing the biological function of nucleus, genome regulation and drug discovery. In this study, a new feature representation was pro-posed by combining position specific scoring matrix (PSSM) and auto covariance (AC). The AC variables describe the neighboring effect between two amino acids, so that they incorpo-rate the sequence-order information;PSSM de-scribes the information of biological evolution of proteins. Based on this new descriptor, a support vector machine (SVM) classifier was built to predict subnuclear localization. To evaluate the power of our predictor, the benchmark dataset that contains 714 proteins localized in nine subnuclear compartments was utilized. The total jackknife cross validation ac-curacy of our method is 76.5%, that is higher than those of the Nuc-PLoc (67.4%), the OET- KNN (55.6%), AAC based SVM (48.9%) and ProtLoc (36.6%). The prediction software used in this article and the details of the SVM parameters are freely available at http://chemlab.scu.edu.cn/ predict_SubNL/index.htm and the dataset used in our study is from Shen and Chou’s work by downloading at http://chou.med.harvard.edu/ bioinf/Nuc-PLoc/Data.htm.
文摘为了提高小麦麦粒识别的识别率,采用了拉普拉斯卷积网络(Convolution Network Based on Laplacian Eigenmap,LENet)和支持矩阵机(Support Matrix Machines,SMM)分类器相结合的方法对小麦麦粒进行识别。拉普拉斯卷积网络是一种无反馈的轻量型级联卷积神经网络,可以用来提取小麦麦粒的特征,该网络通过拉普拉斯特征映射来学习网络的参数,输出层通过块直方图编码和矩阵化处理实现,最终提取的特征使用SMM分类器进行分类。通过在建立的小麦麦粒图像数据库上的实验表明,该麦粒识别方法要优于一些传统特征提取分类方法,取得了较好的识别效果。
文摘Identification of the drug-binding residues on the surface of proteins is a vital step in drug discovery and it is important for understanding protein function. Most previous researches are based on the structural information of proteins, but the structures of most proteins are not available. So in this article, a sequence-based method was proposed by combining the support vector machine (SVM)-based ensemble learning and the improved position specific scoring matrix (PSSM). In order to take the local environment information of a drug-binding site into account, an improved PSSM profile scaled by the sliding window and smoothing window was used to improve the prediction result. In addition, a new SVM-based ensemble learning method was developed to deal with the imbalanced data classification problem that commonly exists in the binding site predictions. When performed on the dataset of 985 drug-binding residues, the method achieved a very promising prediction result with the area under the curve (AUC) of 0.9264. Furthermore, an independent dataset of 349 drug- binding residues was used to evaluate the pre- diction model and the prediction accuracy is 84.68%. These results suggest that our method is effective for predicting the drug-binding sites in proteins. The code and all datasets used in this article are freely available at http://cic.scu.edu.cn/bioinformatics/Ensem_DBS.zip.
基金supported by Research Grants Council of Hong Kong under Grant No.17301214HKU CERG Grants,Fundamental Research Funds for the Central Universities+2 种基金the Research Funds of Renmin University of ChinaHung Hing Ying Physical Research Grantthe Natural Science Foundation of China under Grant No.11271144
文摘Driven by the challenge of integrating large amount of experimental data, classification technique emerges as one of the major and popular tools in computational biology and bioinformatics research. Machine learning methods, especially kernel methods with Support Vector Machines (SVMs) are very popular and effective tools. In the perspective of kernel matrix, a technique namely Eigen- matrix translation has been introduced for protein data classification. The Eigen-matrix translation strategy has a lot of nice properties which deserve more exploration. This paper investigates the major role of Eigen-matrix translation in classification. The authors propose that its importance lies in the dimension reduction of predictor attributes within the data set. This is very important when the dimension of features is huge. The authors show by numerical experiments on real biological data sets that the proposed framework is crucial and effective in improving classification accuracy. This can therefore serve as a novel perspective for future research in dimension reduction problems.
基金co-supported by the National Natural Science Foundation of China (No. 61203170)the Fundamental Research Funds for the Central Universities (No. NS2012026)Startup Foundation for Introduced Talents of Nanjing University of Aeronautics and Astronautics (No. 1007-YAH10047)
文摘Impacted craters are commonly found on the surface of planets, satellites, asteroids and other solar system bodies. In order to speed up the rate of constructing the database of craters, it is important to develop crater detection algorithms. This paper presents a novel approach to automatically detect craters on planetary surfaces. The approach contains two parts: crater candidate region selection and crater detection. In the first part, crater candidate region selection is achieved by Kanade-Lucas-Tomasi (KLT) detector. Matrix-pattern-oriented least squares support vector machine (MatLSSVM), as the matrixization version of least square support vector machine (SVM), inherits the advantages of least squares support vector machine (LSSVM), reduces storage space greatly and reserves spatial redundancies within each image matrix compared with general LSSVM. The second part of the approach employs MatLSSVM to design classifier for crater detection. Experimental results on the dataset which comprises 160 preprocessed image patches from Google Mars demonstrate that the accuracy rate of crater detection can be up to 88%. In addition, the outstanding feature of the approach introduced in this paper is that it takes resized crater candidate region as input pattern directly to finish crater detection. The results of the last experiment demonstrate that MatLSSVM-based classifier can detect crater regions effectively on the basis of KLT-based crater candidate region selection.