The Neighborhood Preserving Embedding(NPE) algorithm is recently proposed as a new dimensionality reduction method.However, it is confined to linear transforms in the data space.For this, based on the NPE algorithm, a...The Neighborhood Preserving Embedding(NPE) algorithm is recently proposed as a new dimensionality reduction method.However, it is confined to linear transforms in the data space.For this, based on the NPE algorithm, a new nonlinear dimensionality reduction method is proposed, which can preserve the local structures of the data in the feature space.First, combined with the Mercer kernel, the solution to the weight matrix in the feature space is gotten and then the corresponding eigenvalue problem of the Kernel NPE(KNPE) method is deduced.Finally, the KNPE algorithm is resolved through a transformed optimization problem and QR decomposition.The experimental results on three real-world data sets show that the new method is better than NPE, Kernel PCA(KPCA) and Kernel LDA(KLDA) in performance.展开更多
Predicting protein functions is an important issue in the post-genomic era. This paper studies several network-based kernels including local linear embedding (LLE) kernel method, diffusion kernel and laplacian kerne...Predicting protein functions is an important issue in the post-genomic era. This paper studies several network-based kernels including local linear embedding (LLE) kernel method, diffusion kernel and laplacian kernel to uncover the relationship between proteins functions and protein-protein interactions (PPI). The author first construct kernels based on PPI networks, then apply support vector machine (SVM) techniques to classify proteins into different functional groups. The 5-fold cross validation is then applied to the selected 359 GO terms to compare the performance of different kernels and guilt-by-association methods including neighbor counting methods and Chi-square methods. Finally, the authors conduct predictions of functions of some unknown genes and verify the preciseness of our prediction in part by the information of other data source.展开更多
Predicting the response variables of the target dataset is one of the main problems in machine learning. Predictive models are desired to perform satisfactorily in a broad range of target domains. However, that may no...Predicting the response variables of the target dataset is one of the main problems in machine learning. Predictive models are desired to perform satisfactorily in a broad range of target domains. However, that may not be plausible if there is a mismatch between the source and target domain distributions. The goal of domain adaptation algorithms is to solve this issue and deploy a model across different target domains. We propose a method based on kernel distribution embedding and Hilbert-Schmidt independence criterion (HSIC) to address this problem. The proposed method embeds both source and target data into a new feature space with two properties: 1) the distributions of the source and the target datasets are as close as possible in the new feature space, and 2) the important structural information of the data is preserved. The embedded data can be in lower dimensional space while preserving the aforementioned properties and therefore the method can be considered as a dimensionality reduction method as well. Our proposed method has a closed-form solution and the experimental results show that it works well in practice.展开更多
文摘The Neighborhood Preserving Embedding(NPE) algorithm is recently proposed as a new dimensionality reduction method.However, it is confined to linear transforms in the data space.For this, based on the NPE algorithm, a new nonlinear dimensionality reduction method is proposed, which can preserve the local structures of the data in the feature space.First, combined with the Mercer kernel, the solution to the weight matrix in the feature space is gotten and then the corresponding eigenvalue problem of the Kernel NPE(KNPE) method is deduced.Finally, the KNPE algorithm is resolved through a transformed optimization problem and QR decomposition.The experimental results on three real-world data sets show that the new method is better than NPE, Kernel PCA(KPCA) and Kernel LDA(KLDA) in performance.
基金This research is supported in part by HKRGC Grant 7017/07P, HKU CRCG Grants, HKU strategic theme grant on computational sciences, HKU Hung Hing Ying Physical Science Research Grant, National Natural Science Foundation of China Grant No. 10971075 and Guangdong Provincial Natural Science Grant No. 9151063101000021. The preliminary version of this paper has been presented in the OSB2009 conference and published in the corresponding conference proceedings[25]. The authors would like to thank the anonymous referees for their helpful comments and suggestions.
文摘Predicting protein functions is an important issue in the post-genomic era. This paper studies several network-based kernels including local linear embedding (LLE) kernel method, diffusion kernel and laplacian kernel to uncover the relationship between proteins functions and protein-protein interactions (PPI). The author first construct kernels based on PPI networks, then apply support vector machine (SVM) techniques to classify proteins into different functional groups. The 5-fold cross validation is then applied to the selected 359 GO terms to compare the performance of different kernels and guilt-by-association methods including neighbor counting methods and Chi-square methods. Finally, the authors conduct predictions of functions of some unknown genes and verify the preciseness of our prediction in part by the information of other data source.
文摘Predicting the response variables of the target dataset is one of the main problems in machine learning. Predictive models are desired to perform satisfactorily in a broad range of target domains. However, that may not be plausible if there is a mismatch between the source and target domain distributions. The goal of domain adaptation algorithms is to solve this issue and deploy a model across different target domains. We propose a method based on kernel distribution embedding and Hilbert-Schmidt independence criterion (HSIC) to address this problem. The proposed method embeds both source and target data into a new feature space with two properties: 1) the distributions of the source and the target datasets are as close as possible in the new feature space, and 2) the important structural information of the data is preserved. The embedded data can be in lower dimensional space while preserving the aforementioned properties and therefore the method can be considered as a dimensionality reduction method as well. Our proposed method has a closed-form solution and the experimental results show that it works well in practice.