Finding crucial vertices is a key problem for improving the reliability and ensuring the effective operation of networks,solved by approaches based on multiple attribute decision that suffer from ignoring the correlat...Finding crucial vertices is a key problem for improving the reliability and ensuring the effective operation of networks,solved by approaches based on multiple attribute decision that suffer from ignoring the correlation among each attribute or the heterogeneity between attribute and structure. To overcome these problems, a novel vertex centrality approach, called VCJG, is proposed based on joint nonnegative matrix factorization and graph embedding. The potential attributes with linearly independent and the structure information are captured automatically in light of nonnegative matrix factorization for factorizing the weighted adjacent matrix and the structure matrix, which is generated by graph embedding. And the smoothness strategy is applied to eliminate the heterogeneity between attributes and structure by joint nonnegative matrix factorization. Then VCJG integrates the above steps to formulate an overall objective function, and obtain the ultimately potential attributes fused the structure information of network through optimizing the objective function. Finally, the attributes are combined with neighborhood rules to evaluate vertex's importance. Through comparative analyses with experiments on nine real-world networks, we demonstrate that the proposed approach outperforms nine state-of-the-art algorithms for identification of vital vertices with respect to correlation, monotonicity and accuracy of top-10 vertices ranking.展开更多
This paper considers a problem of unsupervised spectral unmixing of hyperspectral data. Based on the Linear Mixing Model ( LMM), a new method under the framework of nonnegative matrix fac- torization (NMF) is prop...This paper considers a problem of unsupervised spectral unmixing of hyperspectral data. Based on the Linear Mixing Model ( LMM), a new method under the framework of nonnegative matrix fac- torization (NMF) is proposed, namely minimum distance constrained nonnegative matrix factoriza- tion (MDC-NMF). In this paper, firstly, a new regularization term, called endmember distance (ED) is considered, which is defined as the sum of the squared Euclidean distances from each end- member to their geometric center. Compared with the simplex volume, ED has better optimization properties and is conceptually intuitive. Secondly, a projected gradient (PG) scheme is adopted, and by the virtue of ED, in this scheme the optimal step size along the feasible descent direction can be calculated easily at each iteration. Thirdly, a finite step ( no more than the number of endmem- bers) terminated algorithm is used to project a point on the canonical simplex, by which the abun- dance nonnegative constraint and abundance sum-to-one constraint can be accurately satisfied in a light amount of computation. The experimental results, based on a set of synthetic data and real da- ta, demonstrate that, in the same running time, MDC-NMF outperforms several other similar meth- ods proposed recently.展开更多
Currently,functional connectomes constructed from neuroimaging data have emerged as a powerful tool in identifying brain disorders.If one brain disease just manifests as some cognitive dysfunction,it means that the di...Currently,functional connectomes constructed from neuroimaging data have emerged as a powerful tool in identifying brain disorders.If one brain disease just manifests as some cognitive dysfunction,it means that the disease may affect some local connectivity in the brain functional network.That is,there are functional abnormalities in the sub-network.Therefore,it is crucial to accurately identify them in pathological diagnosis.To solve these problems,we proposed a sub-network extraction method based on graph regularization nonnegative matrix factorization(GNMF).The dynamic functional networks of normal subjects and early mild cognitive impairment(eMCI)subjects were vectorized and the functional connection vectors(FCV)were assembled to aggregation matrices.Then GNMF was applied to factorize the aggregation matrix to get the base matrix,in which the column vectors were restored to a common sub-network and a distinctive sub-network,and visualization and statistical analysis were conducted on the two sub-networks,respectively.Experimental results demonstrated that,compared with other matrix factorization methods,the proposed method can more obviously reflect the similarity between the common subnetwork of eMCI subjects and normal subjects,as well as the difference between the distinctive sub-network of eMCI subjects and normal subjects,Therefore,the high-dimensional features in brain functional networks can be best represented locally in the lowdimensional space,which provides a new idea for studying brain functional connectomes.展开更多
Traditional data driven fault detection methods assume that the process operates in a single mode so that they cannot perform well in processes with multiple operating modes. To monitor multimode processes effectively...Traditional data driven fault detection methods assume that the process operates in a single mode so that they cannot perform well in processes with multiple operating modes. To monitor multimode processes effectively,this paper proposes a novel process monitoring scheme based on orthogonal nonnegative matrix factorization(ONMF) and hidden Markov model(HMM). The new clustering technique ONMF is employed to separate data from different process modes. The multiple HMMs for various operating modes lead to higher modeling accuracy.The proposed approach does not presume the distribution of data in each mode because the process uncertainty and dynamics can be well interpreted through the hidden Markov estimation. The HMM-based monitoring indication named negative log likelihood probability is utilized for fault detection. In order to assess the proposed monitoring strategy, a numerical example and the Tennessee Eastman process are used. The results demonstrate that this method provides efficient fault detection performance.展开更多
An image fusion method combining complex contourlet transform(CCT) with nonnegative matrix factorization(NMF) is proposed in this paper.After two images are decomposed by CCT,NMF is applied to their highand low-freque...An image fusion method combining complex contourlet transform(CCT) with nonnegative matrix factorization(NMF) is proposed in this paper.After two images are decomposed by CCT,NMF is applied to their highand low-frequency components,respectively,and finally an image is synthesized.Subjective-visual-quality of the image fusion result is compared with those of the image fusion methods based on NMF and the combination of wavelet /contourlet /nonsubsampled contourlet with NMF.The experimental results are evaluated quantitatively,and the running time is also contrasted.It is shown that the proposed image fusion method can gain larger information entropy,standard deviation and mean gradient,which means that it can better integrate featured information from all source images,avoid background noise and promote space clearness in the fusion image effectively.展开更多
Link prediction has attracted wide attention among interdisciplinaryresearchers as an important issue in complex network. It aims to predict the missing links in current networks and new links that will appear in fut...Link prediction has attracted wide attention among interdisciplinaryresearchers as an important issue in complex network. It aims to predict the missing links in current networks and new links that will appear in future networks.Despite the presence of missing links in the target network of link prediction studies, the network it processes remains macroscopically as a large connectedgraph. However, the complexity of the real world makes the complex networksabstracted from real systems often contain many isolated nodes. This phenomenon leads to existing link prediction methods not to efficiently implement the prediction of missing edges on isolated nodes. Therefore, the cold-start linkprediction is favored as one of the most valuable subproblems of traditional linkprediction. However, due to the loss of many links in the observation network, thetopological information available for completing the link prediction task is extremely scarce. This presents a severe challenge for the study of cold-start link prediction. Therefore, how to mine and fuse more available non-topologicalinformation from observed network becomes the key point to solve the problemof cold-start link prediction. In this paper, we propose a framework for solving thecold-start link prediction problem, a joint-weighted symmetric nonnegative matrixfactorization model fusing graph regularization information, based on low-rankapproximation algorithms in the field of machine learning. First, the nonlinear features in high-dimensional space of node attributes are captured by the designedgraph regularization term. Second, using a weighted matrix, we associate the attribute similarity and first order structure information of nodes and constrain eachother. Finally, a unified framework for implementing cold-start link prediction isconstructed by using a symmetric nonnegative matrix factorization model to integrate the multiple information extracted together. Extensive experimental validationon five real networks with attributes shows that the proposed model has very goodpredictive performance when predicting missing edges of isolated nodes.展开更多
Orthogonal nonnegative matrix factorization(ONMF)is widely used in blind image separation problem,document classification,and human face recognition.The model of ONMF can be efficiently solved by the alternating direc...Orthogonal nonnegative matrix factorization(ONMF)is widely used in blind image separation problem,document classification,and human face recognition.The model of ONMF can be efficiently solved by the alternating direction method of multipliers and hierarchical alternating least squares method.When the given matrix is huge,the cost of computation and communication is too high.Therefore,ONMF becomes challenging in the large-scale setting.The random projection is an efficient method of dimensionality reduction.In this paper,we apply the random projection to ONMF and propose two randomized algorithms.Numerical experiments show that our proposed algorithms perform well on both simulated and real data.展开更多
Hyperspectral imagery generally contains a very large amount of data due to hundreds of spectral bands.Band selection is often applied firstly to reduce computational cost and facilitate subsequent tasks such as land-...Hyperspectral imagery generally contains a very large amount of data due to hundreds of spectral bands.Band selection is often applied firstly to reduce computational cost and facilitate subsequent tasks such as land-cover classification and higher level image analysis.In this paper,we propose a new band selection algorithm using sparse nonnegative matrix factorization (sparse NMF).Though acting as a clustering method for band selection,sparse NMF need not consider the distance metric between different spectral bands,which is often the key step for most common clustering-based band selection methods.By imposing sparsity on the coefficient matrix,the bands' clustering assignments can be easily indicated through the largest entry in each column of the matrix.Experimental results showed that sparse NMF provides considerable insight into the clustering-based band selection problem and the selected bands are good for land-cover classification.展开更多
In the past decades,advances in high-throughput technologies have led to the generation of huge amounts of biological data that require analysis and interpretation.Recently,nonnegative matrix factorization (NMF) has...In the past decades,advances in high-throughput technologies have led to the generation of huge amounts of biological data that require analysis and interpretation.Recently,nonnegative matrix factorization (NMF) has been introduced as an efficient way to reduce the complexity of data as well as to interpret them,and has been applied to various fields of biological research.In this paper,we present CloudNMF,a distributed open-source implementation of NMF on a MapReduce framework.Experimental evaluation demonstrated that CloudNMF is scalable and can be used to deal with huge amounts of data,which may enable various kinds of a high-throughput biological data analysis in the cloud.CloudNMF is freely accessible at http://admis.fudan.edu.cn/projects/CloudNMF.html.展开更多
The identification and analysis of spatiotemporal traffic patterns in road networks constitute a crucial process for sophisticated traffic management and control.Traditional methods based on mathematical equations and...The identification and analysis of spatiotemporal traffic patterns in road networks constitute a crucial process for sophisticated traffic management and control.Traditional methods based on mathematical equations and statistical models can hardly be applicable to large-scale urban road networks,where traffic states exhibit high degrees of dynamics and complexity.Recently,advances in data collection and processing have provided new opportunities to effectively understand spatiotemporal traffic patterns in large-scale road networks using data-driven methods.However,limited efforts have been exerted to explore the essential structure of the networks when conducting a spatiotemporal analysis of traffic characteristics.To this end,this study proposes a modified nonnegative matrix factorization algorithm that processes high-dimensional traffic data and provides an improved representation of the global traffic state.After matrix factorization,cluster analysis is conducted based on the obtained low-dimensional representative matrices,which contain different traffic patterns and serve as the basis for exploring the temporal dynamics and spatial structure of network congestion.The applicability and effectiveness of the proposed approach are examined in a road network of Beijing,China.Results show that the methods exhibit considerable potential for identifying and interpreting the spatiotemporal traffic patterns over the entire network and provide a systematic and efficient approach for analyzing the network-level traffic state.展开更多
In this paper, we study a band constrained nonnegative matrix factorization (band NMF) problem: for a given nonnegative matrix Y, decompose it as Y ≈ AX with A a nonnegative matrix and X a nonnegative block band m...In this paper, we study a band constrained nonnegative matrix factorization (band NMF) problem: for a given nonnegative matrix Y, decompose it as Y ≈ AX with A a nonnegative matrix and X a nonnegative block band matrix. This factorization model extends a single low rank subspace model to a mixture of several overlapping low rank subspaces, which not only can provide sparse representation, but also can capture signifi- cant grouping structure from a dataset. Based on overlapping subspace clustering and the capture of the level of overlap between neighbouring subspaces, two simple and practical algorithms are presented to solve the band NMF problem. Numerical experiments on both synthetic data and real images data show that band NMF enhances the performance of NMF in data representation and processing.展开更多
The orthogonal nonnegative matrix factorization (ONMF) has many applications in a variety of areas such as data mining, information processing and pattern recognition. In this paper, we propose a novel initializatio...The orthogonal nonnegative matrix factorization (ONMF) has many applications in a variety of areas such as data mining, information processing and pattern recognition. In this paper, we propose a novel initialization method for the ONMF based on the Lanczos bidiagonalization and the nonnegative approximation of rank one matrix. Numerical experiments are given to show that our initialization strategy is effective and efficient.展开更多
Hyperspectral unmixing is a powerful tool for the remote sensing image mining. Nonnegative matrix factorization (NMF) has been adopted to deal with this issue, while the precision of unmixing is closely related with t...Hyperspectral unmixing is a powerful tool for the remote sensing image mining. Nonnegative matrix factorization (NMF) has been adopted to deal with this issue, while the precision of unmixing is closely related with the local minimizers of NMF. We present two novel initialization strategies that is based on CUR decomposition, which is physically meaningful. In the experimental test, NMF with the new initialization method is used to unmix the urban scene which was captured by airborne visible/infrared imaging spectrometer (AVIRIS) in 1997, numerical results show that the initialization methods work well.展开更多
This paper introduces an algorithm for the nonnegative matrix factorization-and-completion problem, which aims to find nonnegative low-rank matrices X and Y so that the product XY approximates a nonnegative data matri...This paper introduces an algorithm for the nonnegative matrix factorization-and-completion problem, which aims to find nonnegative low-rank matrices X and Y so that the product XY approximates a nonnegative data matrix M whose elements are partially known (to a certain accuracy). This problem aggregates two existing problems: (i) nonnegative matrix factorization where all entries of M are given, and (ii) low-rank matrix completion where non- negativity is not required. By taking the advantages of both nonnegativity and low-rankness, one can generally obtain superior results than those of just using one of the two properties. We propose to solve the non-convex constrained least-squares problem using an algorithm based on tile classical alternating direction augmented Lagrangian method. Preliminary convergence properties of the algorithm and numerical simulation results are presented. Compared to a recent algorithm for nonnegative matrix factorization, the proposed algorithm produces factorizations of similar quality using only about half of the matrix entries. On tasks of recovering incomplete grayscale and hyperspeetral images, the proposed algorithm yields overall better qualities than those produced by two recent matrix-completion algorithms that do not exploit nonnegativity.展开更多
The nonnegative tensor (matrix) factorization finds more and more applications in various disciplines including machine learning, data mining, and blind source separation, etc. In computation, the optimization probl...The nonnegative tensor (matrix) factorization finds more and more applications in various disciplines including machine learning, data mining, and blind source separation, etc. In computation, the optimization problem involved is solved by alternatively minimizing one factor while the others are fixed. To solve the subproblem efficiently, we first exploit a variable regularization term which makes the subproblem far from ill-condition. Second, an augmented Lagrangian alternating direction method is employed to solve this convex and well-conditioned regularized subproblem, and two accelerating skills are also implemented. Some preliminary numerical experiments are performed to show the improvements of the new method.展开更多
Real-world data can often be represented in multiple forms and views,and analyzing data from different perspectives allows for more comprehensive learning of the data,resulting in better data clustering results.Non-ne...Real-world data can often be represented in multiple forms and views,and analyzing data from different perspectives allows for more comprehensive learning of the data,resulting in better data clustering results.Non-negative matrix factorization(NMF)is used to solve the clustering problem to extract uniform discriminative low-dimensional features from multi-view data.Many clustering methods based on graph regularization have been proposed and proven to be effective,but ordinary graphs only consider pairwise relationships between samples.In order to learn the higher-order relationships that exist in the sample manifold and feature manifold of multi-view data,we propose a new semi-supervised multi-view clustering method called dual hypergraph regularized partially shared non-negative matrix factorization(DHPS-NMF).The complex manifold structure of samples and features is learned by constructing samples and feature hypergraphs.To improve the discrimination power of the obtained lowdimensional features,semi-supervised regression terms are incorporated into the model to effectively use the label information when capturing the complex manifold structure of the data.Ultimately,we conduct experiments on six real data sets and the results show that our algorithm achieves encouraging results in comparison with some methods.展开更多
Most of the existing algorithms for blind sources separation have a limitation that sources are statistically independent. However, in many practical applications, the source signals are non- negative and mutual stati...Most of the existing algorithms for blind sources separation have a limitation that sources are statistically independent. However, in many practical applications, the source signals are non- negative and mutual statistically dependent signals. When the observations are nonnegative linear combinations of nonnegative sources, the correlation coefficients of the observations are larger than these of source signals. In this letter, a novel Nonnegative Matrix Factorization (NMF) algorithm with least correlated component constraints to blind separation of convolutive mixed sources is proposed. The algorithm relaxes the source independence assumption and has low-complexity algebraic com- putations. Simulation results on blind source separation including real face image data indicate that the sources can be successfully recovered with the algorithm.展开更多
Purpose–The purpose of this paper is to analyze topics as alternative features for sentiment analysis in Indonesian tweets.Design/methodology/approach–Given Indonesian tweets,the processes of sentiment analysis star...Purpose–The purpose of this paper is to analyze topics as alternative features for sentiment analysis in Indonesian tweets.Design/methodology/approach–Given Indonesian tweets,the processes of sentiment analysis start by extracting features from the tweets.The features are words or topics.The authors use non-negative matrix factorization to extract the topics and apply a support vector machine to classify the tweets into its sentiment class.Findings–The authors analyze the accuracy using the two-class and three-class sentiment analysis data sets.Both data sets are about sentiments of candidates for Indonesian presidential election.The experiments show that the standard word features give better accuracies than the topics features for the two-class sentiment analysis.Moreover,the topic features can slightly improve the accuracy of the standard word features.The topic features can also improve the accuracy of the standard word features for the three-class sentiment analysis.Originality/value–The standard textual data representation for sentiment analysis using machine learning is bag of word and its extensions mainly created by natural language processing.This paper applies topics as novel features for the machine learning-based sentiment analysis in Indonesian tweets.展开更多
Background:Single-cell RNA sequencing(scRNA-seq)data provides a whole new view to study disease and cell differentiation development.With the explosive increment of scRNA-seq data,effective models are demanded for min...Background:Single-cell RNA sequencing(scRNA-seq)data provides a whole new view to study disease and cell differentiation development.With the explosive increment of scRNA-seq data,effective models are demanded for mining the intrinsic biological information.Methods:This paper proposes a novel non-negative matrix factorization(NMF)method for clustering and gene coexpression network analysis,termed Adaptive Total Variation Constraint Hypergraph Regularized NMF(ATV-HNMF).ATV-HNMF can adaptively select the different schemes to denoise the cluster or preserve the cluster boundary information between clusters based on the gradient information.Besides,ATV-HNMF incorporates hypergraph regularization,which can consider high-order relationships between cells to reserve the intrinsic structure of the space.Results:Experiments show that the performances on clustering outperform other compared methods,and the network construction results are consistent with previous studies,which illustrate that our model is effective and useful.Conclusion:From the clustering results,we can see that ATV-HNMF outperforms other methods,which can help us to understand the heterogeneity.We can discover many disease-related genes from the constructed network,and some are worthy of further clinical exploration.展开更多
基金Project supported by the National Natural Science Foundation of China (Grant Nos.62162040 and 11861045)。
文摘Finding crucial vertices is a key problem for improving the reliability and ensuring the effective operation of networks,solved by approaches based on multiple attribute decision that suffer from ignoring the correlation among each attribute or the heterogeneity between attribute and structure. To overcome these problems, a novel vertex centrality approach, called VCJG, is proposed based on joint nonnegative matrix factorization and graph embedding. The potential attributes with linearly independent and the structure information are captured automatically in light of nonnegative matrix factorization for factorizing the weighted adjacent matrix and the structure matrix, which is generated by graph embedding. And the smoothness strategy is applied to eliminate the heterogeneity between attributes and structure by joint nonnegative matrix factorization. Then VCJG integrates the above steps to formulate an overall objective function, and obtain the ultimately potential attributes fused the structure information of network through optimizing the objective function. Finally, the attributes are combined with neighborhood rules to evaluate vertex's importance. Through comparative analyses with experiments on nine real-world networks, we demonstrate that the proposed approach outperforms nine state-of-the-art algorithms for identification of vital vertices with respect to correlation, monotonicity and accuracy of top-10 vertices ranking.
基金Supported by the National Natural Science Foundation of China ( No. 60872083 ) and the National High Technology Research and Development Program of China (No. 2007AA12Z149).
文摘This paper considers a problem of unsupervised spectral unmixing of hyperspectral data. Based on the Linear Mixing Model ( LMM), a new method under the framework of nonnegative matrix fac- torization (NMF) is proposed, namely minimum distance constrained nonnegative matrix factoriza- tion (MDC-NMF). In this paper, firstly, a new regularization term, called endmember distance (ED) is considered, which is defined as the sum of the squared Euclidean distances from each end- member to their geometric center. Compared with the simplex volume, ED has better optimization properties and is conceptually intuitive. Secondly, a projected gradient (PG) scheme is adopted, and by the virtue of ED, in this scheme the optimal step size along the feasible descent direction can be calculated easily at each iteration. Thirdly, a finite step ( no more than the number of endmem- bers) terminated algorithm is used to project a point on the canonical simplex, by which the abun- dance nonnegative constraint and abundance sum-to-one constraint can be accurately satisfied in a light amount of computation. The experimental results, based on a set of synthetic data and real da- ta, demonstrate that, in the same running time, MDC-NMF outperforms several other similar meth- ods proposed recently.
基金supported by the National Natural Science Foundation of China(No.51877013),(ZJ),(http://www.nsfc.gov.cn/)the Natural Science Foundation of Jiangsu Province(No.BK20181463),(ZJ),(http://kxjst.jiangsu.gov.cn/)sponsored by Qing Lan Project of Jiangsu Province(no specific grant number),(ZJ),(http://jyt.jiangsu.gov.cn/).
文摘Currently,functional connectomes constructed from neuroimaging data have emerged as a powerful tool in identifying brain disorders.If one brain disease just manifests as some cognitive dysfunction,it means that the disease may affect some local connectivity in the brain functional network.That is,there are functional abnormalities in the sub-network.Therefore,it is crucial to accurately identify them in pathological diagnosis.To solve these problems,we proposed a sub-network extraction method based on graph regularization nonnegative matrix factorization(GNMF).The dynamic functional networks of normal subjects and early mild cognitive impairment(eMCI)subjects were vectorized and the functional connection vectors(FCV)were assembled to aggregation matrices.Then GNMF was applied to factorize the aggregation matrix to get the base matrix,in which the column vectors were restored to a common sub-network and a distinctive sub-network,and visualization and statistical analysis were conducted on the two sub-networks,respectively.Experimental results demonstrated that,compared with other matrix factorization methods,the proposed method can more obviously reflect the similarity between the common subnetwork of eMCI subjects and normal subjects,as well as the difference between the distinctive sub-network of eMCI subjects and normal subjects,Therefore,the high-dimensional features in brain functional networks can be best represented locally in the lowdimensional space,which provides a new idea for studying brain functional connectomes.
基金Supported by the National Natural Science Foundation of China(61374140,61403072)
文摘Traditional data driven fault detection methods assume that the process operates in a single mode so that they cannot perform well in processes with multiple operating modes. To monitor multimode processes effectively,this paper proposes a novel process monitoring scheme based on orthogonal nonnegative matrix factorization(ONMF) and hidden Markov model(HMM). The new clustering technique ONMF is employed to separate data from different process modes. The multiple HMMs for various operating modes lead to higher modeling accuracy.The proposed approach does not presume the distribution of data in each mode because the process uncertainty and dynamics can be well interpreted through the hidden Markov estimation. The HMM-based monitoring indication named negative log likelihood probability is utilized for fault detection. In order to assess the proposed monitoring strategy, a numerical example and the Tennessee Eastman process are used. The results demonstrate that this method provides efficient fault detection performance.
基金Supported by National Natural Science Foundation of China (No. 60872065)
文摘An image fusion method combining complex contourlet transform(CCT) with nonnegative matrix factorization(NMF) is proposed in this paper.After two images are decomposed by CCT,NMF is applied to their highand low-frequency components,respectively,and finally an image is synthesized.Subjective-visual-quality of the image fusion result is compared with those of the image fusion methods based on NMF and the combination of wavelet /contourlet /nonsubsampled contourlet with NMF.The experimental results are evaluated quantitatively,and the running time is also contrasted.It is shown that the proposed image fusion method can gain larger information entropy,standard deviation and mean gradient,which means that it can better integrate featured information from all source images,avoid background noise and promote space clearness in the fusion image effectively.
基金supported by the Teaching Reform Research Project of Qinghai Minzu University,China(2021-JYYB-009)the“Chunhui Plan”Cooperative Scientific Research Project of the Ministry of Education of China(2018).
文摘Link prediction has attracted wide attention among interdisciplinaryresearchers as an important issue in complex network. It aims to predict the missing links in current networks and new links that will appear in future networks.Despite the presence of missing links in the target network of link prediction studies, the network it processes remains macroscopically as a large connectedgraph. However, the complexity of the real world makes the complex networksabstracted from real systems often contain many isolated nodes. This phenomenon leads to existing link prediction methods not to efficiently implement the prediction of missing edges on isolated nodes. Therefore, the cold-start linkprediction is favored as one of the most valuable subproblems of traditional linkprediction. However, due to the loss of many links in the observation network, thetopological information available for completing the link prediction task is extremely scarce. This presents a severe challenge for the study of cold-start link prediction. Therefore, how to mine and fuse more available non-topologicalinformation from observed network becomes the key point to solve the problemof cold-start link prediction. In this paper, we propose a framework for solving thecold-start link prediction problem, a joint-weighted symmetric nonnegative matrixfactorization model fusing graph regularization information, based on low-rankapproximation algorithms in the field of machine learning. First, the nonlinear features in high-dimensional space of node attributes are captured by the designedgraph regularization term. Second, using a weighted matrix, we associate the attribute similarity and first order structure information of nodes and constrain eachother. Finally, a unified framework for implementing cold-start link prediction isconstructed by using a symmetric nonnegative matrix factorization model to integrate the multiple information extracted together. Extensive experimental validationon five real networks with attributes shows that the proposed model has very goodpredictive performance when predicting missing edges of isolated nodes.
基金the National Natural Science Foundation of China(No.11901359)Shandong Provincial Natural Science Foundation(No.ZR2019QA017)。
文摘Orthogonal nonnegative matrix factorization(ONMF)is widely used in blind image separation problem,document classification,and human face recognition.The model of ONMF can be efficiently solved by the alternating direction method of multipliers and hierarchical alternating least squares method.When the given matrix is huge,the cost of computation and communication is too high.Therefore,ONMF becomes challenging in the large-scale setting.The random projection is an efficient method of dimensionality reduction.In this paper,we apply the random projection to ONMF and propose two randomized algorithms.Numerical experiments show that our proposed algorithms perform well on both simulated and real data.
基金Project (No.60872071) supported by the National Natural Science Foundation of China
文摘Hyperspectral imagery generally contains a very large amount of data due to hundreds of spectral bands.Band selection is often applied firstly to reduce computational cost and facilitate subsequent tasks such as land-cover classification and higher level image analysis.In this paper,we propose a new band selection algorithm using sparse nonnegative matrix factorization (sparse NMF).Though acting as a clustering method for band selection,sparse NMF need not consider the distance metric between different spectral bands,which is often the key step for most common clustering-based band selection methods.By imposing sparsity on the coefficient matrix,the bands' clustering assignments can be easily indicated through the largest entry in each column of the matrix.Experimental results showed that sparse NMF provides considerable insight into the clustering-based band selection problem and the selected bands are good for land-cover classification.
基金financially supported by National High Technology Research and Development Program of China(863 Program Grant No.2012AA020403)National Natural Science Foundation of China(Grant Nos.61173118 and 61272380)
文摘In the past decades,advances in high-throughput technologies have led to the generation of huge amounts of biological data that require analysis and interpretation.Recently,nonnegative matrix factorization (NMF) has been introduced as an efficient way to reduce the complexity of data as well as to interpret them,and has been applied to various fields of biological research.In this paper,we present CloudNMF,a distributed open-source implementation of NMF on a MapReduce framework.Experimental evaluation demonstrated that CloudNMF is scalable and can be used to deal with huge amounts of data,which may enable various kinds of a high-throughput biological data analysis in the cloud.CloudNMF is freely accessible at http://admis.fudan.edu.cn/projects/CloudNMF.html.
基金the National Natural Science Foundation of China(U1564212,61773036,51508014)Beijing Natural Science Foundation(9172011)Young Elite Scientist Sponsorship Program of the China Association for Science and Technology(2016QNRC001)。
文摘The identification and analysis of spatiotemporal traffic patterns in road networks constitute a crucial process for sophisticated traffic management and control.Traditional methods based on mathematical equations and statistical models can hardly be applicable to large-scale urban road networks,where traffic states exhibit high degrees of dynamics and complexity.Recently,advances in data collection and processing have provided new opportunities to effectively understand spatiotemporal traffic patterns in large-scale road networks using data-driven methods.However,limited efforts have been exerted to explore the essential structure of the networks when conducting a spatiotemporal analysis of traffic characteristics.To this end,this study proposes a modified nonnegative matrix factorization algorithm that processes high-dimensional traffic data and provides an improved representation of the global traffic state.After matrix factorization,cluster analysis is conducted based on the obtained low-dimensional representative matrices,which contain different traffic patterns and serve as the basis for exploring the temporal dynamics and spatial structure of network congestion.The applicability and effectiveness of the proposed approach are examined in a road network of Beijing,China.Results show that the methods exhibit considerable potential for identifying and interpreting the spatiotemporal traffic patterns over the entire network and provide a systematic and efficient approach for analyzing the network-level traffic state.
文摘In this paper, we study a band constrained nonnegative matrix factorization (band NMF) problem: for a given nonnegative matrix Y, decompose it as Y ≈ AX with A a nonnegative matrix and X a nonnegative block band matrix. This factorization model extends a single low rank subspace model to a mixture of several overlapping low rank subspaces, which not only can provide sparse representation, but also can capture signifi- cant grouping structure from a dataset. Based on overlapping subspace clustering and the capture of the level of overlap between neighbouring subspaces, two simple and practical algorithms are presented to solve the band NMF problem. Numerical experiments on both synthetic data and real images data show that band NMF enhances the performance of NMF in data representation and processing.
基金Acknowledgments. The work is supported by National Natural Science Foundation of China No. 10961010.
文摘The orthogonal nonnegative matrix factorization (ONMF) has many applications in a variety of areas such as data mining, information processing and pattern recognition. In this paper, we propose a novel initialization method for the ONMF based on the Lanczos bidiagonalization and the nonnegative approximation of rank one matrix. Numerical experiments are given to show that our initialization strategy is effective and efficient.
文摘Hyperspectral unmixing is a powerful tool for the remote sensing image mining. Nonnegative matrix factorization (NMF) has been adopted to deal with this issue, while the precision of unmixing is closely related with the local minimizers of NMF. We present two novel initialization strategies that is based on CUR decomposition, which is physically meaningful. In the experimental test, NMF with the new initialization method is used to unmix the urban scene which was captured by airborne visible/infrared imaging spectrometer (AVIRIS) in 1997, numerical results show that the initialization methods work well.
文摘This paper introduces an algorithm for the nonnegative matrix factorization-and-completion problem, which aims to find nonnegative low-rank matrices X and Y so that the product XY approximates a nonnegative data matrix M whose elements are partially known (to a certain accuracy). This problem aggregates two existing problems: (i) nonnegative matrix factorization where all entries of M are given, and (ii) low-rank matrix completion where non- negativity is not required. By taking the advantages of both nonnegativity and low-rankness, one can generally obtain superior results than those of just using one of the two properties. We propose to solve the non-convex constrained least-squares problem using an algorithm based on tile classical alternating direction augmented Lagrangian method. Preliminary convergence properties of the algorithm and numerical simulation results are presented. Compared to a recent algorithm for nonnegative matrix factorization, the proposed algorithm produces factorizations of similar quality using only about half of the matrix entries. On tasks of recovering incomplete grayscale and hyperspeetral images, the proposed algorithm yields overall better qualities than those produced by two recent matrix-completion algorithms that do not exploit nonnegativity.
文摘The nonnegative tensor (matrix) factorization finds more and more applications in various disciplines including machine learning, data mining, and blind source separation, etc. In computation, the optimization problem involved is solved by alternatively minimizing one factor while the others are fixed. To solve the subproblem efficiently, we first exploit a variable regularization term which makes the subproblem far from ill-condition. Second, an augmented Lagrangian alternating direction method is employed to solve this convex and well-conditioned regularized subproblem, and two accelerating skills are also implemented. Some preliminary numerical experiments are performed to show the improvements of the new method.
基金supported by the National Natural Science Foundation of China (Grant Nos.62073087,U1911401,62071132,and 61973090)the Guangdong Key R&D Project of China (Grant No.2019B010121001)。
文摘Real-world data can often be represented in multiple forms and views,and analyzing data from different perspectives allows for more comprehensive learning of the data,resulting in better data clustering results.Non-negative matrix factorization(NMF)is used to solve the clustering problem to extract uniform discriminative low-dimensional features from multi-view data.Many clustering methods based on graph regularization have been proposed and proven to be effective,but ordinary graphs only consider pairwise relationships between samples.In order to learn the higher-order relationships that exist in the sample manifold and feature manifold of multi-view data,we propose a new semi-supervised multi-view clustering method called dual hypergraph regularized partially shared non-negative matrix factorization(DHPS-NMF).The complex manifold structure of samples and features is learned by constructing samples and feature hypergraphs.To improve the discrimination power of the obtained lowdimensional features,semi-supervised regression terms are incorporated into the model to effectively use the label information when capturing the complex manifold structure of the data.Ultimately,we conduct experiments on six real data sets and the results show that our algorithm achieves encouraging results in comparison with some methods.
基金Supported by the Specialized Research Fund for the Doctoral Program of Higher Education of China (No.20060280003)Shanghai Leading Academic Dis-cipline Project (T0102)
文摘Most of the existing algorithms for blind sources separation have a limitation that sources are statistically independent. However, in many practical applications, the source signals are non- negative and mutual statistically dependent signals. When the observations are nonnegative linear combinations of nonnegative sources, the correlation coefficients of the observations are larger than these of source signals. In this letter, a novel Nonnegative Matrix Factorization (NMF) algorithm with least correlated component constraints to blind separation of convolutive mixed sources is proposed. The algorithm relaxes the source independence assumption and has low-complexity algebraic com- putations. Simulation results on blind source separation including real face image data indicate that the sources can be successfully recovered with the algorithm.
文摘Purpose–The purpose of this paper is to analyze topics as alternative features for sentiment analysis in Indonesian tweets.Design/methodology/approach–Given Indonesian tweets,the processes of sentiment analysis start by extracting features from the tweets.The features are words or topics.The authors use non-negative matrix factorization to extract the topics and apply a support vector machine to classify the tweets into its sentiment class.Findings–The authors analyze the accuracy using the two-class and three-class sentiment analysis data sets.Both data sets are about sentiments of candidates for Indonesian presidential election.The experiments show that the standard word features give better accuracies than the topics features for the two-class sentiment analysis.Moreover,the topic features can slightly improve the accuracy of the standard word features.The topic features can also improve the accuracy of the standard word features for the three-class sentiment analysis.Originality/value–The standard textual data representation for sentiment analysis using machine learning is bag of word and its extensions mainly created by natural language processing.This paper applies topics as novel features for the machine learning-based sentiment analysis in Indonesian tweets.
基金supported in part by the grants provided by the National Natural Science Foundation of China(No.61872220).
文摘Background:Single-cell RNA sequencing(scRNA-seq)data provides a whole new view to study disease and cell differentiation development.With the explosive increment of scRNA-seq data,effective models are demanded for mining the intrinsic biological information.Methods:This paper proposes a novel non-negative matrix factorization(NMF)method for clustering and gene coexpression network analysis,termed Adaptive Total Variation Constraint Hypergraph Regularized NMF(ATV-HNMF).ATV-HNMF can adaptively select the different schemes to denoise the cluster or preserve the cluster boundary information between clusters based on the gradient information.Besides,ATV-HNMF incorporates hypergraph regularization,which can consider high-order relationships between cells to reserve the intrinsic structure of the space.Results:Experiments show that the performances on clustering outperform other compared methods,and the network construction results are consistent with previous studies,which illustrate that our model is effective and useful.Conclusion:From the clustering results,we can see that ATV-HNMF outperforms other methods,which can help us to understand the heterogeneity.We can discover many disease-related genes from the constructed network,and some are worthy of further clinical exploration.