Contrastive learning is a significant research direction in the field of deep learning.However,existing data augmentation methods often lead to issues such as semantic drift in generated views while the complexity of ...Contrastive learning is a significant research direction in the field of deep learning.However,existing data augmentation methods often lead to issues such as semantic drift in generated views while the complexity of model pre-training limits further improvement in the performance of existing methods.To address these challenges,we propose the Efficient Clustering Network based on Matrix Factorization(ECN-MF).Specifically,we design a batched low-rank Singular Value Decomposition(SVD)algorithm for data augmentation to eliminate redundant information and uncover major patterns of variation and key information in the data.Additionally,we design a Mutual Information-Enhanced Clustering Module(MI-ECM)to accelerate the training process by leveraging a simple architecture to bring samples from the same cluster closer while pushing samples from other clusters apart.Extensive experiments on six datasets demonstrate that ECN-MF exhibits more effective performance compared to state-of-the-art algorithms.展开更多
This study aimed to investigate the pollution characteristics, source apportionment, and health risks associated with trace metal(loid)s(TMs) in the major agricultural producing areas in Chongqing, China. We analyzed ...This study aimed to investigate the pollution characteristics, source apportionment, and health risks associated with trace metal(loid)s(TMs) in the major agricultural producing areas in Chongqing, China. We analyzed the source apportionment and assessed the health risk of TMs in agricultural soils by using positive matrix factorization(PMF) model and health risk assessment(HRA) model based on Monte Carlo simulation. Meanwhile, we combined PMF and HRA models to explore the health risks of TMs in agricultural soils by different pollution sources to determine the priority control factors. Results showed that the average contents of cadmium(Cd), arsenic (As), lead(Pb), chromium(Cr), copper(Cu), nickel(Ni), and zinc(Zn) in the soil were found to be 0.26, 5.93, 27.14, 61.32, 23.81, 32.45, and 78.65 mg/kg, respectively. Spatial analysis and source apportionment analysis revealed that urban and industrial sources, agricultural sources, and natural sources accounted for 33.0%, 27.7%, and 39.3% of TM accumulation in the soil, respectively. In the HRA model based on Monte Carlo simulation, noncarcinogenic risks were deemed negligible(hazard index <1), the carcinogenic risks were at acceptable level(10^(-6)<total carcinogenic risk ≤ 10^(-4)), with higher risks observed for children compared to adults. The relationship between TMs, their sources, and health risks indicated that urban and industrial sources were primarily associated with As, contributing to 75.1% of carcinogenic risks and 55.7% of non-carcinogenic risks, making them the primary control factors. Meanwhile, agricultural sources were primarily linked to Cd and Pb, contributing to 13.1% of carcinogenic risks and 21.8% of non-carcinogenic risks, designating them as secondary control factors.展开更多
Finding crucial vertices is a key problem for improving the reliability and ensuring the effective operation of networks,solved by approaches based on multiple attribute decision that suffer from ignoring the correlat...Finding crucial vertices is a key problem for improving the reliability and ensuring the effective operation of networks,solved by approaches based on multiple attribute decision that suffer from ignoring the correlation among each attribute or the heterogeneity between attribute and structure. To overcome these problems, a novel vertex centrality approach, called VCJG, is proposed based on joint nonnegative matrix factorization and graph embedding. The potential attributes with linearly independent and the structure information are captured automatically in light of nonnegative matrix factorization for factorizing the weighted adjacent matrix and the structure matrix, which is generated by graph embedding. And the smoothness strategy is applied to eliminate the heterogeneity between attributes and structure by joint nonnegative matrix factorization. Then VCJG integrates the above steps to formulate an overall objective function, and obtain the ultimately potential attributes fused the structure information of network through optimizing the objective function. Finally, the attributes are combined with neighborhood rules to evaluate vertex's importance. Through comparative analyses with experiments on nine real-world networks, we demonstrate that the proposed approach outperforms nine state-of-the-art algorithms for identification of vital vertices with respect to correlation, monotonicity and accuracy of top-10 vertices ranking.展开更多
Background:Establishing an appropriate prognostic model for PCa is essential for its effective treatment.Glycolysis is a vital energy-harvesting mechanism for tumors.Developing a prognostic model for PCa based on glyc...Background:Establishing an appropriate prognostic model for PCa is essential for its effective treatment.Glycolysis is a vital energy-harvesting mechanism for tumors.Developing a prognostic model for PCa based on glycolysis-related genes is novel and has great potential.Methods:First,gene expression and clinical data of PCa patients were downloaded from The Cancer Genome Atlas(TCGA)and Gene Expression Omnibus(GEO),and glycolysis-related genes were obtained from the Molecular Signatures Database(MSigDB).Gene enrichment analysis was performed to verify that glycolysis functions were enriched in the genes we obtained,which were used in nonnegative matrix factorization(NMF)to identify clusters.The correlation between clusters and clinical features was discussed,and the differentially expressed genes(DEGs)between the two clusters were investigated.Based on the DEGs,we investigated the biological differences between clusters,including immune cell infiltration,mutation,tumor immune dysfunction and exclusion,immune function,and checkpoint genes.To establish the prognostic model,the genes were filtered based on univariable Cox regression,LASSO,and multivariable Cox regression.Kaplan–Meier analysis and receiver operating characteristic analysis validated the prognostic value of the model.A nomogram of the risk score calculated by the prognostic model and clinical characteristics was constructed to quantitatively estimate the survival probability for PCa patients in the clinical setting.Result:The genes obtained from MSigDB were enriched in glycolysis functions.Two clusters were identified by NMF analysis based on 272 glycolysis-related genes,and a prognostic model based on DEGs between the two clusters was finally established.The prognostic model consisted of LAMPS,SPRN,ATOH1,TANC1,ETV1,TDRD1,KLK14,MESP2,POSTN,CRIP2,NAT1,AKR7A3,PODXL,CARTPT,and PCDHGB2.All sample,training,and test cohorts from The Cancer Genome Atlas(TCGA)and the external validation cohort from GEO showed significant differences between the high-risk and low-risk groups.The area under the ROC curve showed great performance of this prognostic model.Conclusion:A prognostic model based on glycolysis-related genes was established,with great performance and potential significance to the clinical application.展开更多
Deep matrix factorization(DMF)has been demonstrated to be a powerful tool to take in the complex hierarchical information of multi-view data(MDR).However,existing multiview DMF methods mainly explore the consistency o...Deep matrix factorization(DMF)has been demonstrated to be a powerful tool to take in the complex hierarchical information of multi-view data(MDR).However,existing multiview DMF methods mainly explore the consistency of multi-view data,while neglecting the diversity among different views as well as the high-order relationships of data,resulting in the loss of valuable complementary information.In this paper,we design a hypergraph regularized diverse deep matrix factorization(HDDMF)model for multi-view data representation,to jointly utilize multi-view diversity and a high-order manifold in a multilayer factorization framework.A novel diversity enhancement term is designed to exploit the structural complementarity between different views of data.Hypergraph regularization is utilized to preserve the high-order geometry structure of data in each view.An efficient iterative optimization algorithm is developed to solve the proposed model with theoretical convergence analysis.Experimental results on five real-world data sets demonstrate that the proposed method significantly outperforms stateof-the-art multi-view learning approaches.展开更多
Data is humongous today because of the extensive use of World WideWeb, Social Media and Intelligent Systems. This data can be very important anduseful if it is harnessed carefully and correctly. Useful information can...Data is humongous today because of the extensive use of World WideWeb, Social Media and Intelligent Systems. This data can be very important anduseful if it is harnessed carefully and correctly. Useful information can beextracted from this massive data using the Data Mining process. The informationextracted can be used to make vital decisions in various industries. Clustering is avery popular Data Mining method which divides the data points into differentgroups such that all similar data points form a part of the same group. Clusteringmethods are of various types. Many parameters and indexes exist for the evaluationand comparison of these methods. In this paper, we have compared partitioningbased methods K-Means, Fuzzy C-Means (FCM), Partitioning AroundMedoids (PAM) and Clustering Large Application (CLARA) on secure perturbeddata. Comparison and identification has been done for the method which performsbetter for analyzing the data perturbed using Extended NMF on the basis of thevalues of various indexes like Dunn Index, Silhouette Index, Xie-Beni Indexand Davies-Bouldin Index.展开更多
In the case of massive data,matrix operations are very computationally intensive,and the memory limitation in standalone mode leads to the system inefficiencies.At the same time,it is difficult for matrix operations t...In the case of massive data,matrix operations are very computationally intensive,and the memory limitation in standalone mode leads to the system inefficiencies.At the same time,it is difficult for matrix operations to achieve flexible switching between different requirements when implemented in hardware.To address this problem,this paper proposes a matrix operation accelerator based on reconfigurable arrays in the context of the application of recommender systems(RS).Based on the reconfigurable array processor(APR-16)with reconfiguration,a parallelized design of matrix operations on processing element(PE)array is realized with flexibility.The experimental results show that,compared with the proposed central processing unit(CPU)and graphics processing unit(GPU)hybrid implementation matrix multiplication framework,the energy efficiency ratio of the accelerator proposed in this paper is improved by about 35×.Compared with blocked alternating least squares(BALS),its the energy efficiency ratio has been accelerated by about 1×,and the switching of matrix factorization(MF)schemes suitable for different sparsity can be realized.展开更多
Due to the non-stationary characteristics of vibration signals acquired from rolling element bearing fault, thc time-frequency analysis is often applied to describe the local information of these unstable signals smar...Due to the non-stationary characteristics of vibration signals acquired from rolling element bearing fault, thc time-frequency analysis is often applied to describe the local information of these unstable signals smartly. However, it is difficult to classitythe high dimensional feature matrix directly because of too large dimensions for many classifiers. This paper combines the concepts of time-frequency distribution(TFD) with non-negative matrix factorization(NMF), and proposes a novel TFD matrix factorization method to enhance representation and identification of bearing fault. Throughout this method, the TFD of a vibration signal is firstly accomplished to describe the localized faults with short-time Fourier transform(STFT). Then, the supervised NMF mapping is adopted to extract the fault features from TFD. Meanwhile, the fault samples can be clustered and recognized automatically by using the clustering property of NMF. The proposed method takes advantages of the NMF in the parts-based representation and the adaptive clustering. The localized fault features of interest can be extracted as well. To evaluate the performance of the proposed method, the 9 kinds of the bearing fault on a test bench is performed. The proposed method can effectively identify the fault severity and different fault types. Moreover, in comparison with the artificial neural network(ANN), NMF yields 99.3% mean accuracy which is much superior to ANN. This research presents a simple and practical resolution for the fault diagnosis problem of rolling element bearing in high dimensional feature space.展开更多
This paper presents a novel medical image registration algorithm named total variation constrained graphregularization for non-negative matrix factorization(TV-GNMF).The method utilizes non-negative matrix factorizati...This paper presents a novel medical image registration algorithm named total variation constrained graphregularization for non-negative matrix factorization(TV-GNMF).The method utilizes non-negative matrix factorization by total variation constraint and graph regularization.The main contributions of our work are the following.First,total variation is incorporated into NMF to control the diffusion speed.The purpose is to denoise in smooth regions and preserve features or details of the data in edge regions by using a diffusion coefficient based on gradient information.Second,we add graph regularization into NMF to reveal intrinsic geometry and structure information of features to enhance the discrimination power.Third,the multiplicative update rules and proof of convergence of the TV-GNMF algorithm are given.Experiments conducted on datasets show that the proposed TV-GNMF method outperforms other state-of-the-art algorithms.展开更多
Nonnegative matrix factorization (NMF) is a method to get parts-based features of information and form the typical profiles. But the basis vectors NMF gets are not orthogonal so that parts-based features of informatio...Nonnegative matrix factorization (NMF) is a method to get parts-based features of information and form the typical profiles. But the basis vectors NMF gets are not orthogonal so that parts-based features of information are usually redundancy. In this paper, we propose two different approaches based on localized non-negative matrix factorization (LNMF) to obtain the typical user session profiles and typical semantic profiles of junk mails. The LNMF get basis vectors as orthogonal as possible so that it can get accurate profiles. The experiments show that the approach based on LNMF can obtain better profiles than the approach based on NMF. Key words localized non-negative matrix factorization - profile - log mining - mail filtering CLC number TP 391 Foundation item: Supported by the National Natural Science Foundation of China (60373066, 60303024), National Grand Fundamental Research 973 Program of China (2002CB312000), National Research Foundation for the Doctoral Program of Higher Education of China (20020286004).Biography: Jiang Ji-xiang (1980-), male, Master candidate, research direction: data mining, knowledge representation on the Web.展开更多
A current problem in diet recommendation systems is the matching of food preferences with nutritional requirements,taking into account individual characteristics,such as body weight with individual health conditions,s...A current problem in diet recommendation systems is the matching of food preferences with nutritional requirements,taking into account individual characteristics,such as body weight with individual health conditions,such as diabetes.Current dietary recommendations employ association rules,content-based collaborative filtering,and constraint-based methods,which have several limitations.These limitations are due to the existence of a special user group and an imbalance of non-simple attributes.Making use of traditional dietary recommendation algorithm researches,we combine the Adaboost classifier with probabilistic matrix factorization.We present a personalized diet recommendation algorithm by taking advantage of probabilistic matrix factorization via Adaboost.A probabilistic matrix factorization method extracts the implicit factors between individual food preferences and nutritional characteristics.From this,we can make use of those features with strong influence while discarding those with little influence.After incorporating these changes into our approach,we evaluated our algorithm’s performance.Our results show that our method performed better than others at matching preferred foods with dietary requirements,benefiting user health as a result.The algorithm fully considers the constraint relationship between users’attributes and nutritional characteristics of foods.Considering many complex factors in our algorithm,the recommended food result set meets both health standards and users’dietary preferences.A comparison of our algorithm with others demonstrated that our method offers high accuracy and interpretability.展开更多
This paper considers a problem of unsupervised spectral unmixing of hyperspectral data. Based on the Linear Mixing Model ( LMM), a new method under the framework of nonnegative matrix fac- torization (NMF) is prop...This paper considers a problem of unsupervised spectral unmixing of hyperspectral data. Based on the Linear Mixing Model ( LMM), a new method under the framework of nonnegative matrix fac- torization (NMF) is proposed, namely minimum distance constrained nonnegative matrix factoriza- tion (MDC-NMF). In this paper, firstly, a new regularization term, called endmember distance (ED) is considered, which is defined as the sum of the squared Euclidean distances from each end- member to their geometric center. Compared with the simplex volume, ED has better optimization properties and is conceptually intuitive. Secondly, a projected gradient (PG) scheme is adopted, and by the virtue of ED, in this scheme the optimal step size along the feasible descent direction can be calculated easily at each iteration. Thirdly, a finite step ( no more than the number of endmem- bers) terminated algorithm is used to project a point on the canonical simplex, by which the abun- dance nonnegative constraint and abundance sum-to-one constraint can be accurately satisfied in a light amount of computation. The experimental results, based on a set of synthetic data and real da- ta, demonstrate that, in the same running time, MDC-NMF outperforms several other similar meth- ods proposed recently.展开更多
Traditional data driven fault detection methods assume that the process operates in a single mode so that they cannot perform well in processes with multiple operating modes. To monitor multimode processes effectively...Traditional data driven fault detection methods assume that the process operates in a single mode so that they cannot perform well in processes with multiple operating modes. To monitor multimode processes effectively,this paper proposes a novel process monitoring scheme based on orthogonal nonnegative matrix factorization(ONMF) and hidden Markov model(HMM). The new clustering technique ONMF is employed to separate data from different process modes. The multiple HMMs for various operating modes lead to higher modeling accuracy.The proposed approach does not presume the distribution of data in each mode because the process uncertainty and dynamics can be well interpreted through the hidden Markov estimation. The HMM-based monitoring indication named negative log likelihood probability is utilized for fault detection. In order to assess the proposed monitoring strategy, a numerical example and the Tennessee Eastman process are used. The results demonstrate that this method provides efficient fault detection performance.展开更多
Currently,functional connectomes constructed from neuroimaging data have emerged as a powerful tool in identifying brain disorders.If one brain disease just manifests as some cognitive dysfunction,it means that the di...Currently,functional connectomes constructed from neuroimaging data have emerged as a powerful tool in identifying brain disorders.If one brain disease just manifests as some cognitive dysfunction,it means that the disease may affect some local connectivity in the brain functional network.That is,there are functional abnormalities in the sub-network.Therefore,it is crucial to accurately identify them in pathological diagnosis.To solve these problems,we proposed a sub-network extraction method based on graph regularization nonnegative matrix factorization(GNMF).The dynamic functional networks of normal subjects and early mild cognitive impairment(eMCI)subjects were vectorized and the functional connection vectors(FCV)were assembled to aggregation matrices.Then GNMF was applied to factorize the aggregation matrix to get the base matrix,in which the column vectors were restored to a common sub-network and a distinctive sub-network,and visualization and statistical analysis were conducted on the two sub-networks,respectively.Experimental results demonstrated that,compared with other matrix factorization methods,the proposed method can more obviously reflect the similarity between the common subnetwork of eMCI subjects and normal subjects,as well as the difference between the distinctive sub-network of eMCI subjects and normal subjects,Therefore,the high-dimensional features in brain functional networks can be best represented locally in the lowdimensional space,which provides a new idea for studying brain functional connectomes.展开更多
An image fusion method combining complex contourlet transform(CCT) with nonnegative matrix factorization(NMF) is proposed in this paper.After two images are decomposed by CCT,NMF is applied to their highand low-freque...An image fusion method combining complex contourlet transform(CCT) with nonnegative matrix factorization(NMF) is proposed in this paper.After two images are decomposed by CCT,NMF is applied to their highand low-frequency components,respectively,and finally an image is synthesized.Subjective-visual-quality of the image fusion result is compared with those of the image fusion methods based on NMF and the combination of wavelet /contourlet /nonsubsampled contourlet with NMF.The experimental results are evaluated quantitatively,and the running time is also contrasted.It is shown that the proposed image fusion method can gain larger information entropy,standard deviation and mean gradient,which means that it can better integrate featured information from all source images,avoid background noise and promote space clearness in the fusion image effectively.展开更多
Non-negative matrix factorization (NMF) is a technique for dimensionality reduction by placing non-negativity constraints on the matrix. Based on the PARAFAC model, NMF was extended for three-dimension data decompos...Non-negative matrix factorization (NMF) is a technique for dimensionality reduction by placing non-negativity constraints on the matrix. Based on the PARAFAC model, NMF was extended for three-dimension data decomposition. The three-dimension nonnegative matrix factorization (NMF3) algorithm, which was concise and easy to implement, was given in this paper. The NMF3 algorithm implementation was based on elements but not on vectors. It could decompose a data array directly without unfolding, which was not similar to that the traditional algorithms do, It has been applied to the simulated data array decomposition and obtained reasonable results. It showed that NMF3 could be introduced for curve resolution in chemometrics.展开更多
A novel framework is proposed to obtain physiologically meaningful features for Alzheimer's disease(AD)classification based on sparse functional connectivity and non-negative matrix factorization.Specifically,the ...A novel framework is proposed to obtain physiologically meaningful features for Alzheimer's disease(AD)classification based on sparse functional connectivity and non-negative matrix factorization.Specifically,the non-negative adaptive sparse representation(NASR)method is applied to compute the sparse functional connectivity among brain regions based on functional magnetic resonance imaging(fMRI)data for feature extraction.Afterwards,the sparse non-negative matrix factorization(sNMF)method is adopted for dimensionality reduction to obtain low-dimensional features with straightforward physical meaning.The experimental results show that the proposed framework outperforms the competing frameworks in terms of classification accuracy,sensitivity and specificity.Furthermore,three sub-networks,including the default mode network,the basal ganglia-thalamus-limbic network and the temporal-insular network,are found to have notable differences between the AD patients and the healthy subjects.The proposed framework can effectively identify AD patients and has potentials for extending the understanding of the pathological changes of AD.展开更多
The concept of Fiedler matrices was introduced in [1] by L. Stuart and R.Weaver.In[1], they investigated the factorization of Fiedler matrix into Fiedler matrices and pre-sented some open questions i. e. When is a Fie...The concept of Fiedler matrices was introduced in [1] by L. Stuart and R.Weaver.In[1], they investigated the factorization of Fiedler matrix into Fiedler matrices and pre-sented some open questions i. e. When is a Fiedler matrix factorizable as a product ofFiedler matrices? Are there useful sufficient conditions? If a Fiedler matrix is factorizable,are the factors unique? If not, are the dimensions of the factors unique? In this paper,展开更多
In this paper we compute Karmarkar's projections quickly using MoorePenrose g-inverse and matrix factorization. So the computation work of (ATD2A)-1is decreased.
Link prediction has attracted wide attention among interdisciplinaryresearchers as an important issue in complex network. It aims to predict the missing links in current networks and new links that will appear in fut...Link prediction has attracted wide attention among interdisciplinaryresearchers as an important issue in complex network. It aims to predict the missing links in current networks and new links that will appear in future networks.Despite the presence of missing links in the target network of link prediction studies, the network it processes remains macroscopically as a large connectedgraph. However, the complexity of the real world makes the complex networksabstracted from real systems often contain many isolated nodes. This phenomenon leads to existing link prediction methods not to efficiently implement the prediction of missing edges on isolated nodes. Therefore, the cold-start linkprediction is favored as one of the most valuable subproblems of traditional linkprediction. However, due to the loss of many links in the observation network, thetopological information available for completing the link prediction task is extremely scarce. This presents a severe challenge for the study of cold-start link prediction. Therefore, how to mine and fuse more available non-topologicalinformation from observed network becomes the key point to solve the problemof cold-start link prediction. In this paper, we propose a framework for solving thecold-start link prediction problem, a joint-weighted symmetric nonnegative matrixfactorization model fusing graph regularization information, based on low-rankapproximation algorithms in the field of machine learning. First, the nonlinear features in high-dimensional space of node attributes are captured by the designedgraph regularization term. Second, using a weighted matrix, we associate the attribute similarity and first order structure information of nodes and constrain eachother. Finally, a unified framework for implementing cold-start link prediction isconstructed by using a symmetric nonnegative matrix factorization model to integrate the multiple information extracted together. Extensive experimental validationon five real networks with attributes shows that the proposed model has very goodpredictive performance when predicting missing edges of isolated nodes.展开更多
基金supported by the Key Research and Development Program of Hainan Province(Grant Nos.ZDYF2023GXJS163,ZDYF2024GXJS014)National Natural Science Foundation of China(NSFC)(Grant Nos.62162022,62162024)+3 种基金the Major Science and Technology Project of Hainan Province(Grant No.ZDKJ2020012)Hainan Provincial Natural Science Foundation of China(Grant No.620MS021)Youth Foundation Project of Hainan Natural Science Foundation(621QN211)Innovative Research Project for Graduate Students in Hainan Province(Grant Nos.Qhys2023-96,Qhys2023-95).
文摘Contrastive learning is a significant research direction in the field of deep learning.However,existing data augmentation methods often lead to issues such as semantic drift in generated views while the complexity of model pre-training limits further improvement in the performance of existing methods.To address these challenges,we propose the Efficient Clustering Network based on Matrix Factorization(ECN-MF).Specifically,we design a batched low-rank Singular Value Decomposition(SVD)algorithm for data augmentation to eliminate redundant information and uncover major patterns of variation and key information in the data.Additionally,we design a Mutual Information-Enhanced Clustering Module(MI-ECM)to accelerate the training process by leveraging a simple architecture to bring samples from the same cluster closer while pushing samples from other clusters apart.Extensive experiments on six datasets demonstrate that ECN-MF exhibits more effective performance compared to state-of-the-art algorithms.
基金supported by Project of Chongqing Science and Technology Bureau (cstc2022jxjl0005)。
文摘This study aimed to investigate the pollution characteristics, source apportionment, and health risks associated with trace metal(loid)s(TMs) in the major agricultural producing areas in Chongqing, China. We analyzed the source apportionment and assessed the health risk of TMs in agricultural soils by using positive matrix factorization(PMF) model and health risk assessment(HRA) model based on Monte Carlo simulation. Meanwhile, we combined PMF and HRA models to explore the health risks of TMs in agricultural soils by different pollution sources to determine the priority control factors. Results showed that the average contents of cadmium(Cd), arsenic (As), lead(Pb), chromium(Cr), copper(Cu), nickel(Ni), and zinc(Zn) in the soil were found to be 0.26, 5.93, 27.14, 61.32, 23.81, 32.45, and 78.65 mg/kg, respectively. Spatial analysis and source apportionment analysis revealed that urban and industrial sources, agricultural sources, and natural sources accounted for 33.0%, 27.7%, and 39.3% of TM accumulation in the soil, respectively. In the HRA model based on Monte Carlo simulation, noncarcinogenic risks were deemed negligible(hazard index <1), the carcinogenic risks were at acceptable level(10^(-6)<total carcinogenic risk ≤ 10^(-4)), with higher risks observed for children compared to adults. The relationship between TMs, their sources, and health risks indicated that urban and industrial sources were primarily associated with As, contributing to 75.1% of carcinogenic risks and 55.7% of non-carcinogenic risks, making them the primary control factors. Meanwhile, agricultural sources were primarily linked to Cd and Pb, contributing to 13.1% of carcinogenic risks and 21.8% of non-carcinogenic risks, designating them as secondary control factors.
基金Project supported by the National Natural Science Foundation of China (Grant Nos.62162040 and 11861045)。
文摘Finding crucial vertices is a key problem for improving the reliability and ensuring the effective operation of networks,solved by approaches based on multiple attribute decision that suffer from ignoring the correlation among each attribute or the heterogeneity between attribute and structure. To overcome these problems, a novel vertex centrality approach, called VCJG, is proposed based on joint nonnegative matrix factorization and graph embedding. The potential attributes with linearly independent and the structure information are captured automatically in light of nonnegative matrix factorization for factorizing the weighted adjacent matrix and the structure matrix, which is generated by graph embedding. And the smoothness strategy is applied to eliminate the heterogeneity between attributes and structure by joint nonnegative matrix factorization. Then VCJG integrates the above steps to formulate an overall objective function, and obtain the ultimately potential attributes fused the structure information of network through optimizing the objective function. Finally, the attributes are combined with neighborhood rules to evaluate vertex's importance. Through comparative analyses with experiments on nine real-world networks, we demonstrate that the proposed approach outperforms nine state-of-the-art algorithms for identification of vital vertices with respect to correlation, monotonicity and accuracy of top-10 vertices ranking.
基金supported by the Public Health Research Project in Futian District,Shenzhen(Grant Nos.FTWS2020026,FTWS2021073).
文摘Background:Establishing an appropriate prognostic model for PCa is essential for its effective treatment.Glycolysis is a vital energy-harvesting mechanism for tumors.Developing a prognostic model for PCa based on glycolysis-related genes is novel and has great potential.Methods:First,gene expression and clinical data of PCa patients were downloaded from The Cancer Genome Atlas(TCGA)and Gene Expression Omnibus(GEO),and glycolysis-related genes were obtained from the Molecular Signatures Database(MSigDB).Gene enrichment analysis was performed to verify that glycolysis functions were enriched in the genes we obtained,which were used in nonnegative matrix factorization(NMF)to identify clusters.The correlation between clusters and clinical features was discussed,and the differentially expressed genes(DEGs)between the two clusters were investigated.Based on the DEGs,we investigated the biological differences between clusters,including immune cell infiltration,mutation,tumor immune dysfunction and exclusion,immune function,and checkpoint genes.To establish the prognostic model,the genes were filtered based on univariable Cox regression,LASSO,and multivariable Cox regression.Kaplan–Meier analysis and receiver operating characteristic analysis validated the prognostic value of the model.A nomogram of the risk score calculated by the prognostic model and clinical characteristics was constructed to quantitatively estimate the survival probability for PCa patients in the clinical setting.Result:The genes obtained from MSigDB were enriched in glycolysis functions.Two clusters were identified by NMF analysis based on 272 glycolysis-related genes,and a prognostic model based on DEGs between the two clusters was finally established.The prognostic model consisted of LAMPS,SPRN,ATOH1,TANC1,ETV1,TDRD1,KLK14,MESP2,POSTN,CRIP2,NAT1,AKR7A3,PODXL,CARTPT,and PCDHGB2.All sample,training,and test cohorts from The Cancer Genome Atlas(TCGA)and the external validation cohort from GEO showed significant differences between the high-risk and low-risk groups.The area under the ROC curve showed great performance of this prognostic model.Conclusion:A prognostic model based on glycolysis-related genes was established,with great performance and potential significance to the clinical application.
基金This work was supported by the National Natural Science Foundation of China(62073087,62071132,61973090).
文摘Deep matrix factorization(DMF)has been demonstrated to be a powerful tool to take in the complex hierarchical information of multi-view data(MDR).However,existing multiview DMF methods mainly explore the consistency of multi-view data,while neglecting the diversity among different views as well as the high-order relationships of data,resulting in the loss of valuable complementary information.In this paper,we design a hypergraph regularized diverse deep matrix factorization(HDDMF)model for multi-view data representation,to jointly utilize multi-view diversity and a high-order manifold in a multilayer factorization framework.A novel diversity enhancement term is designed to exploit the structural complementarity between different views of data.Hypergraph regularization is utilized to preserve the high-order geometry structure of data in each view.An efficient iterative optimization algorithm is developed to solve the proposed model with theoretical convergence analysis.Experimental results on five real-world data sets demonstrate that the proposed method significantly outperforms stateof-the-art multi-view learning approaches.
文摘Data is humongous today because of the extensive use of World WideWeb, Social Media and Intelligent Systems. This data can be very important anduseful if it is harnessed carefully and correctly. Useful information can beextracted from this massive data using the Data Mining process. The informationextracted can be used to make vital decisions in various industries. Clustering is avery popular Data Mining method which divides the data points into differentgroups such that all similar data points form a part of the same group. Clusteringmethods are of various types. Many parameters and indexes exist for the evaluationand comparison of these methods. In this paper, we have compared partitioningbased methods K-Means, Fuzzy C-Means (FCM), Partitioning AroundMedoids (PAM) and Clustering Large Application (CLARA) on secure perturbeddata. Comparison and identification has been done for the method which performsbetter for analyzing the data perturbed using Extended NMF on the basis of thevalues of various indexes like Dunn Index, Silhouette Index, Xie-Beni Indexand Davies-Bouldin Index.
基金the National Key R&D Program of China(No.2022ZD0119001)the National Natural Science Foundation of China(No.61834005)+3 种基金the Shaanxi Province Key R&D Plan(No.2022GY-027)the Key Scientific Research Project of Shaanxi Department of Education(No.22JY060)the Education Research Project of Xi'an University of Posts and Telecommunications(No.JGA202108)the Graduate Student Innovation Fund of Xi’an University of Posts and Telecommunications(No.CXJJYL2022035).
文摘In the case of massive data,matrix operations are very computationally intensive,and the memory limitation in standalone mode leads to the system inefficiencies.At the same time,it is difficult for matrix operations to achieve flexible switching between different requirements when implemented in hardware.To address this problem,this paper proposes a matrix operation accelerator based on reconfigurable arrays in the context of the application of recommender systems(RS).Based on the reconfigurable array processor(APR-16)with reconfiguration,a parallelized design of matrix operations on processing element(PE)array is realized with flexibility.The experimental results show that,compared with the proposed central processing unit(CPU)and graphics processing unit(GPU)hybrid implementation matrix multiplication framework,the energy efficiency ratio of the accelerator proposed in this paper is improved by about 35×.Compared with blocked alternating least squares(BALS),its the energy efficiency ratio has been accelerated by about 1×,and the switching of matrix factorization(MF)schemes suitable for different sparsity can be realized.
基金Supported by Shaanxi Provincial Overall Innovation Project of Science and Technology,China(Grant No.2013KTCQ01-06)
文摘Due to the non-stationary characteristics of vibration signals acquired from rolling element bearing fault, thc time-frequency analysis is often applied to describe the local information of these unstable signals smartly. However, it is difficult to classitythe high dimensional feature matrix directly because of too large dimensions for many classifiers. This paper combines the concepts of time-frequency distribution(TFD) with non-negative matrix factorization(NMF), and proposes a novel TFD matrix factorization method to enhance representation and identification of bearing fault. Throughout this method, the TFD of a vibration signal is firstly accomplished to describe the localized faults with short-time Fourier transform(STFT). Then, the supervised NMF mapping is adopted to extract the fault features from TFD. Meanwhile, the fault samples can be clustered and recognized automatically by using the clustering property of NMF. The proposed method takes advantages of the NMF in the parts-based representation and the adaptive clustering. The localized fault features of interest can be extracted as well. To evaluate the performance of the proposed method, the 9 kinds of the bearing fault on a test bench is performed. The proposed method can effectively identify the fault severity and different fault types. Moreover, in comparison with the artificial neural network(ANN), NMF yields 99.3% mean accuracy which is much superior to ANN. This research presents a simple and practical resolution for the fault diagnosis problem of rolling element bearing in high dimensional feature space.
基金supported by the National Natural Science Foundation of China(61702251,41971424,61701191,U1605254)the Natural Science Basic Research Plan in Shaanxi Province of China(2018JM6030)+4 种基金the Key Technical Project of Fujian Province(2017H6015)the Science and Technology Project of Xiamen(3502Z20183032)the Doctor Scientific Research Starting Foundation of Northwest University(338050050)Youth Academic Talent Support Program of Northwest University(360051900151)the Natural Sciences and Engineering Research Council of Canada,Canada。
文摘This paper presents a novel medical image registration algorithm named total variation constrained graphregularization for non-negative matrix factorization(TV-GNMF).The method utilizes non-negative matrix factorization by total variation constraint and graph regularization.The main contributions of our work are the following.First,total variation is incorporated into NMF to control the diffusion speed.The purpose is to denoise in smooth regions and preserve features or details of the data in edge regions by using a diffusion coefficient based on gradient information.Second,we add graph regularization into NMF to reveal intrinsic geometry and structure information of features to enhance the discrimination power.Third,the multiplicative update rules and proof of convergence of the TV-GNMF algorithm are given.Experiments conducted on datasets show that the proposed TV-GNMF method outperforms other state-of-the-art algorithms.
文摘Nonnegative matrix factorization (NMF) is a method to get parts-based features of information and form the typical profiles. But the basis vectors NMF gets are not orthogonal so that parts-based features of information are usually redundancy. In this paper, we propose two different approaches based on localized non-negative matrix factorization (LNMF) to obtain the typical user session profiles and typical semantic profiles of junk mails. The LNMF get basis vectors as orthogonal as possible so that it can get accurate profiles. The experiments show that the approach based on LNMF can obtain better profiles than the approach based on NMF. Key words localized non-negative matrix factorization - profile - log mining - mail filtering CLC number TP 391 Foundation item: Supported by the National Natural Science Foundation of China (60373066, 60303024), National Grand Fundamental Research 973 Program of China (2002CB312000), National Research Foundation for the Doctoral Program of Higher Education of China (20020286004).Biography: Jiang Ji-xiang (1980-), male, Master candidate, research direction: data mining, knowledge representation on the Web.
基金This work was supported in part by the National Natural Science Foundation of China(51679105,51809112,51939003,61872160)“Thirteenth Five Plan”Science and Technology Project of Education Department,Jilin Province(JJKH20200990KJ).
文摘A current problem in diet recommendation systems is the matching of food preferences with nutritional requirements,taking into account individual characteristics,such as body weight with individual health conditions,such as diabetes.Current dietary recommendations employ association rules,content-based collaborative filtering,and constraint-based methods,which have several limitations.These limitations are due to the existence of a special user group and an imbalance of non-simple attributes.Making use of traditional dietary recommendation algorithm researches,we combine the Adaboost classifier with probabilistic matrix factorization.We present a personalized diet recommendation algorithm by taking advantage of probabilistic matrix factorization via Adaboost.A probabilistic matrix factorization method extracts the implicit factors between individual food preferences and nutritional characteristics.From this,we can make use of those features with strong influence while discarding those with little influence.After incorporating these changes into our approach,we evaluated our algorithm’s performance.Our results show that our method performed better than others at matching preferred foods with dietary requirements,benefiting user health as a result.The algorithm fully considers the constraint relationship between users’attributes and nutritional characteristics of foods.Considering many complex factors in our algorithm,the recommended food result set meets both health standards and users’dietary preferences.A comparison of our algorithm with others demonstrated that our method offers high accuracy and interpretability.
基金Supported by the National Natural Science Foundation of China ( No. 60872083 ) and the National High Technology Research and Development Program of China (No. 2007AA12Z149).
文摘This paper considers a problem of unsupervised spectral unmixing of hyperspectral data. Based on the Linear Mixing Model ( LMM), a new method under the framework of nonnegative matrix fac- torization (NMF) is proposed, namely minimum distance constrained nonnegative matrix factoriza- tion (MDC-NMF). In this paper, firstly, a new regularization term, called endmember distance (ED) is considered, which is defined as the sum of the squared Euclidean distances from each end- member to their geometric center. Compared with the simplex volume, ED has better optimization properties and is conceptually intuitive. Secondly, a projected gradient (PG) scheme is adopted, and by the virtue of ED, in this scheme the optimal step size along the feasible descent direction can be calculated easily at each iteration. Thirdly, a finite step ( no more than the number of endmem- bers) terminated algorithm is used to project a point on the canonical simplex, by which the abun- dance nonnegative constraint and abundance sum-to-one constraint can be accurately satisfied in a light amount of computation. The experimental results, based on a set of synthetic data and real da- ta, demonstrate that, in the same running time, MDC-NMF outperforms several other similar meth- ods proposed recently.
基金Supported by the National Natural Science Foundation of China(61374140,61403072)
文摘Traditional data driven fault detection methods assume that the process operates in a single mode so that they cannot perform well in processes with multiple operating modes. To monitor multimode processes effectively,this paper proposes a novel process monitoring scheme based on orthogonal nonnegative matrix factorization(ONMF) and hidden Markov model(HMM). The new clustering technique ONMF is employed to separate data from different process modes. The multiple HMMs for various operating modes lead to higher modeling accuracy.The proposed approach does not presume the distribution of data in each mode because the process uncertainty and dynamics can be well interpreted through the hidden Markov estimation. The HMM-based monitoring indication named negative log likelihood probability is utilized for fault detection. In order to assess the proposed monitoring strategy, a numerical example and the Tennessee Eastman process are used. The results demonstrate that this method provides efficient fault detection performance.
基金supported by the National Natural Science Foundation of China(No.51877013),(ZJ),(http://www.nsfc.gov.cn/)the Natural Science Foundation of Jiangsu Province(No.BK20181463),(ZJ),(http://kxjst.jiangsu.gov.cn/)sponsored by Qing Lan Project of Jiangsu Province(no specific grant number),(ZJ),(http://jyt.jiangsu.gov.cn/).
文摘Currently,functional connectomes constructed from neuroimaging data have emerged as a powerful tool in identifying brain disorders.If one brain disease just manifests as some cognitive dysfunction,it means that the disease may affect some local connectivity in the brain functional network.That is,there are functional abnormalities in the sub-network.Therefore,it is crucial to accurately identify them in pathological diagnosis.To solve these problems,we proposed a sub-network extraction method based on graph regularization nonnegative matrix factorization(GNMF).The dynamic functional networks of normal subjects and early mild cognitive impairment(eMCI)subjects were vectorized and the functional connection vectors(FCV)were assembled to aggregation matrices.Then GNMF was applied to factorize the aggregation matrix to get the base matrix,in which the column vectors were restored to a common sub-network and a distinctive sub-network,and visualization and statistical analysis were conducted on the two sub-networks,respectively.Experimental results demonstrated that,compared with other matrix factorization methods,the proposed method can more obviously reflect the similarity between the common subnetwork of eMCI subjects and normal subjects,as well as the difference between the distinctive sub-network of eMCI subjects and normal subjects,Therefore,the high-dimensional features in brain functional networks can be best represented locally in the lowdimensional space,which provides a new idea for studying brain functional connectomes.
基金Supported by National Natural Science Foundation of China (No. 60872065)
文摘An image fusion method combining complex contourlet transform(CCT) with nonnegative matrix factorization(NMF) is proposed in this paper.After two images are decomposed by CCT,NMF is applied to their highand low-frequency components,respectively,and finally an image is synthesized.Subjective-visual-quality of the image fusion result is compared with those of the image fusion methods based on NMF and the combination of wavelet /contourlet /nonsubsampled contourlet with NMF.The experimental results are evaluated quantitatively,and the running time is also contrasted.It is shown that the proposed image fusion method can gain larger information entropy,standard deviation and mean gradient,which means that it can better integrate featured information from all source images,avoid background noise and promote space clearness in the fusion image effectively.
文摘Non-negative matrix factorization (NMF) is a technique for dimensionality reduction by placing non-negativity constraints on the matrix. Based on the PARAFAC model, NMF was extended for three-dimension data decomposition. The three-dimension nonnegative matrix factorization (NMF3) algorithm, which was concise and easy to implement, was given in this paper. The NMF3 algorithm implementation was based on elements but not on vectors. It could decompose a data array directly without unfolding, which was not similar to that the traditional algorithms do, It has been applied to the simulated data array decomposition and obtained reasonable results. It showed that NMF3 could be introduced for curve resolution in chemometrics.
基金The Foundation of Hygiene and Health of Jiangsu Province(No.H2018042)the National Natural Science Foundation of China(No.61773114)the Key Research and Development Plan(Industry Foresight and Common Key Technology)of Jiangsu Province(No.BE2017007-3)
文摘A novel framework is proposed to obtain physiologically meaningful features for Alzheimer's disease(AD)classification based on sparse functional connectivity and non-negative matrix factorization.Specifically,the non-negative adaptive sparse representation(NASR)method is applied to compute the sparse functional connectivity among brain regions based on functional magnetic resonance imaging(fMRI)data for feature extraction.Afterwards,the sparse non-negative matrix factorization(sNMF)method is adopted for dimensionality reduction to obtain low-dimensional features with straightforward physical meaning.The experimental results show that the proposed framework outperforms the competing frameworks in terms of classification accuracy,sensitivity and specificity.Furthermore,three sub-networks,including the default mode network,the basal ganglia-thalamus-limbic network and the temporal-insular network,are found to have notable differences between the AD patients and the healthy subjects.The proposed framework can effectively identify AD patients and has potentials for extending the understanding of the pathological changes of AD.
基金This work is supported by the Natural Scientific Research Foundation of Yunnan Province(200A0001--1M)the Scientific Research Foundation of Education Commission of Yunnan Province(9911126)
文摘The concept of Fiedler matrices was introduced in [1] by L. Stuart and R.Weaver.In[1], they investigated the factorization of Fiedler matrix into Fiedler matrices and pre-sented some open questions i. e. When is a Fiedler matrix factorizable as a product ofFiedler matrices? Are there useful sufficient conditions? If a Fiedler matrix is factorizable,are the factors unique? If not, are the dimensions of the factors unique? In this paper,
文摘In this paper we compute Karmarkar's projections quickly using MoorePenrose g-inverse and matrix factorization. So the computation work of (ATD2A)-1is decreased.
基金supported by the Teaching Reform Research Project of Qinghai Minzu University,China(2021-JYYB-009)the“Chunhui Plan”Cooperative Scientific Research Project of the Ministry of Education of China(2018).
文摘Link prediction has attracted wide attention among interdisciplinaryresearchers as an important issue in complex network. It aims to predict the missing links in current networks and new links that will appear in future networks.Despite the presence of missing links in the target network of link prediction studies, the network it processes remains macroscopically as a large connectedgraph. However, the complexity of the real world makes the complex networksabstracted from real systems often contain many isolated nodes. This phenomenon leads to existing link prediction methods not to efficiently implement the prediction of missing edges on isolated nodes. Therefore, the cold-start linkprediction is favored as one of the most valuable subproblems of traditional linkprediction. However, due to the loss of many links in the observation network, thetopological information available for completing the link prediction task is extremely scarce. This presents a severe challenge for the study of cold-start link prediction. Therefore, how to mine and fuse more available non-topologicalinformation from observed network becomes the key point to solve the problemof cold-start link prediction. In this paper, we propose a framework for solving thecold-start link prediction problem, a joint-weighted symmetric nonnegative matrixfactorization model fusing graph regularization information, based on low-rankapproximation algorithms in the field of machine learning. First, the nonlinear features in high-dimensional space of node attributes are captured by the designedgraph regularization term. Second, using a weighted matrix, we associate the attribute similarity and first order structure information of nodes and constrain eachother. Finally, a unified framework for implementing cold-start link prediction isconstructed by using a symmetric nonnegative matrix factorization model to integrate the multiple information extracted together. Extensive experimental validationon five real networks with attributes shows that the proposed model has very goodpredictive performance when predicting missing edges of isolated nodes.