Neural stem cells,which are capable of multi-potential differentiation and self-renewal,have recently been shown to have clinical potential for repairing central nervous system tissue damage.However,the theme trends a...Neural stem cells,which are capable of multi-potential differentiation and self-renewal,have recently been shown to have clinical potential for repairing central nervous system tissue damage.However,the theme trends and knowledge structures for human neural stem cells have not yet been studied bibliometrically.In this study,we retrieved 2742 articles from the PubMed database from 2013 to 2018 using "Neural Stem Cells" as the retrieval word.Co-word analysis was conducted to statistically quantify the characteristics and popular themes of human neural stem cell-related studies.Bibliographic data matrices were generated with the Bibliographic Item Co-Occurrence Matrix Builder.We identified 78 high-frequency Medical Subject Heading(MeSH)terms.A visual matrix was built with the repeated bisection method in gCLUTO software.A social network analysis network was generated with Ucinet 6.0 software and GraphPad Prism 5 software.The analyses demonstrated that in the 6-year period,hot topics were clustered into five categories.As suggested by the constructed strategic diagram,studies related to cytology and physiology were well-developed,whereas those related to neural stem cell applications,tissue engineering,metabolism and cell signaling,and neural stem cell pathology and virology remained immature.Neural stem cell therapy for stroke and Parkinson’s disease,the genetics of microRNAs and brain neoplasms,as well as neuroprotective agents,Zika virus,Notch receptor,neural crest and embryonic stem cells were identified as emerging hot spots.These undeveloped themes and popular topics are potential points of focus for new studies on human neural stem cells.展开更多
Analysis of gene expression data can help to find the time-lagged co-regulation of gene cluster. However, existing method just solve the problem under the condition when the data is discrete number. In this paper, we ...Analysis of gene expression data can help to find the time-lagged co-regulation of gene cluster. However, existing method just solve the problem under the condition when the data is discrete number. In this paper, we propose efficient algorithm to indentify time-lagged co-regulated gene cluster based on real number.展开更多
Microarray contains a large matrix of information and has been widely used by biologists and bio data scientist for monitoring combinations of genes in different organisms.The coherent patterns in all continuous colum...Microarray contains a large matrix of information and has been widely used by biologists and bio data scientist for monitoring combinations of genes in different organisms.The coherent patterns in all continuous columns are mined in gene microarray data matrices.It is investigated,in this study,the coherent patterns in all continuous columns in gene microarray data matrix by developing the time series similarity measure for the coherent patterns in all continuous columns,as well as the evaluation function for verifying the proposed algorithm and the corresponding biclusters.The continuous time changes are taken into account in the coherent patterns in all continuous columns,and co-expression patterns in time series are searched.In order to use all the common information between sequences,a similarity measure for the coherent patterns in continuous columns is defined in this paper.To validate the efficiency of the similarity measure to mine biological information at continuous time points,an evaluation function is defined to measure biclusters,and an effective algorithm is proposed to mine the biclusters.Simulation experiments are conducted to verify the biological significance of the biclusters,which include synthetic datasets and real gene microarray datasets.The performance of the algorithm is analyzed,and the results show that the algorithm is highly efficient.展开更多
Commercial aircraft crews have experienced a trend from five-person crew to dual-pilot crew.Arised from both technological and market demands,Single Pilot Operations(SPO)is considered an important development directio...Commercial aircraft crews have experienced a trend from five-person crew to dual-pilot crew.Arised from both technological and market demands,Single Pilot Operations(SPO)is considered an important development direction in modern aviation technology.In this paper,starting from Dual-Pilot Operations(DPO),the piloting process,decision-making process and decisionmaking mode of DPO for commercial aircraft are studied to obtain the operational requirements of SPO.Then,based on above analysis,the operational mechanism of SPO is studied and the core technology of SPO mode is proposed.Next,a new closed frequent bicluster mining algorithm named FsCluster is proposed for the optimization of the SPO model,and the other efficient bicluster mining algorithm named TsCluster is proposed for the analysis and verification of the SPO model.Finally,a typical flight phase scenario is modelled by Magic System of System,and combined with the proposed algorithms for analysis and verification to determine whether the SPO mode can meet the DPO requirements.展开更多
The availability of large microarray data has led to a growing interest in biclustering methods in the past decade. Several algorithms have been proposed to identify subsets of genes and conditions according to differ...The availability of large microarray data has led to a growing interest in biclustering methods in the past decade. Several algorithms have been proposed to identify subsets of genes and conditions according to different similarity measures and under varying constraints. In this paper we focus on the exclusive row biclustering problem (also known as projected clustering) for gene expression, in which each row can only be a member of a single bicluster while columns can participate in multiple clusters. This type of biclustering may be adequate, for example, for clustering groups of cancer patients where each patient (row) is expected to be carrying only a single type of cancer, while each cancer type is associated with multiple (and possibly overlapping) genes (columns). We present a novel method to identify these exclusive row biclusters in the spirit of the optimal set cover problem. We present our algorithmic solution as a combination of existing biclustering algorithms and combinatorial auction techniques. Furthermore, we devise an approach for tuning the threshold of our algorithm based on comparison with a null model, inspired by the Gap statistic approach. We demonstrate our approach on both synthetic and real world gene expression data and show its power in identifying large span non-overlapping rows submatrices, while considering their unique nature.展开更多
With the continuous advancement of the avionics system,crew members are correspondingly reduced,and Single Pilot Operations(SPO)has attracted widespread attention from scholars.To meet the flight requirements in SPO m...With the continuous advancement of the avionics system,crew members are correspondingly reduced,and Single Pilot Operations(SPO)has attracted widespread attention from scholars.To meet the flight requirements in SPO mode,it is necessary to further strengthen air-ground coordination system integration,but at the same time,there will be some safety issues caused by resource integration,function fusion,and task synthesis.Aimed at the safety problems caused by task synthesis,an efficient differential bicluster mining algorithm--DFCluster algorithm is proposed in this paper to discover potential hazardous elements or propagation mechanisms through mining the resource-function matrixes.To mine efficiently,several pruning techniques are designed for generating maximal biclusters without candidate maintenance.The experimental results show that the DFCluster algorithm is more efficient than the existing differential biclustering algorithms under different scales of artificial datasets and public datasets.Then,a typical flight scenario is designed based on SPO air-ground collaborative system architecture,and combined with our proposed DFCluster algorithm for task synthesis safety analysis.Based on the mining results,the SPO airground collaborative system architecture is modified,which ultimately improves the safety of the SPO system.展开更多
Background: Developing appropriate computational tools to distill biological insights from large-scale gene expression data has been an important part of systems biology. Considering that gene relationships may chang...Background: Developing appropriate computational tools to distill biological insights from large-scale gene expression data has been an important part of systems biology. Considering that gene relationships may change or only exist in a subset of collected samples, biclustering that involves clustering both genes and samples has become in- creasingly important, especially when the samples are pooled from a wide range of experimental conditions. Methods: In this paper, we introduce a new biclustering algorithm to find subsets of genomic expression features (EFs) (e.g., genes, isoforms, exon inclusion) that show strong "group interactions" under certain subsets of samples. Group interactions are defined by strong partial correlations, or equivalently, conditional dependencies between EFs after removing the influences of a set of other functionally related EFs. Our new biclustering method, named SCCA-BC, extends an existing method for group interaction inference, which is based on sparse canonical correlation analysis (SCCA) coupled with repeated random partitioning of the gene expression data set. Results: SCCA-BC gives sensible results on real data sets and outperforms most existing methods in simulations. Software is available at https://github.com/pimentel/scca-bc. Conclusions: SCCA-BC seems to work in numerous conditions and the results seem promising for future extensions. SCCA-BC has the ability to find different types of bicluster patterns, and it is especially advantageous in identifying a bicluster whose elements share the same progressive and multivariate normal distribution with a dense covariance matrix.展开更多
The massive growth of online commercial data has raised the request for an automatic recommender system to benefit both users and merchants.One of the most frequently used recommendation methods is collaborative filte...The massive growth of online commercial data has raised the request for an automatic recommender system to benefit both users and merchants.One of the most frequently used recommendation methods is collaborative filtering,but its accuracy is limited by the sparsity of the rating dataset.Most existing collaborative filtering methods consider all features when calculating user/item similarity and ignore much local information.In collaborative filtering,selecting neighbors and determining users’similarities are the most important parts.For the selection of better neighbors,this study proposes a novel biclustering method based on modified fuzzy adaptive resonance theory.To reflect the similarity between users,a new measure that considers the effect of the number of users’common items is proposed.Specifically,the proposed novel biclustering method is first adopted to obtain local similarity and local prediction.Second,item-based collaborative filtering is used to generate global predictions.Finally,the two resultant predictions are fused to obtain a final one.Experiment results demonstrate that the proposed method outperforms state-of-the-art models in terms of several aspects on three benchmark datasets.展开更多
A biclustering algorithm extends conventional clustering techniques to extract all of the meaningful subgroups of genes and conditions in the expression matrix of a microarray dataset. However, such algorithms are ver...A biclustering algorithm extends conventional clustering techniques to extract all of the meaningful subgroups of genes and conditions in the expression matrix of a microarray dataset. However, such algorithms are very sensitive to input parameters and show poor scalability. This paper proposes a scalable unsupervised biclustering framework, SUBic, to find high quality constant-row biclusters in an expression matrix effectively. A one-dimensional clustering algorithm is proposed to partition the attributes, that is, columns of an expression matrix into disjoint groups based on the similarity of expression values. These groups form a set of short transactions and are used to discover a set of frequent itemsets each of which corresponds to a bicluster. However, a bicluster may include any attribute whose expression value is not similar enough to others, so a bicluster refinement is used to enhance the quality of a bicluster by removing those attributes based on its distribution of expression values. The performance of the proposed method is comparatively analyzed through a series of experiments on synthetic and real datasets.展开更多
Unlike traditional clustering analysis,the biclustering algorithm works simultaneously on two dimensions of samples(row)and variables(column).In recent years,biclustering methods have been developed rapidly and widely...Unlike traditional clustering analysis,the biclustering algorithm works simultaneously on two dimensions of samples(row)and variables(column).In recent years,biclustering methods have been developed rapidly and widely applied in biological data analysis,text clustering,recommendation system and other fields.The traditional clustering algorithms cannot be well adapted to process high-dimensional data and/or large-scale data.At present,most of the biclustering algorithms are designed for the differentially expressed big biological data.However,there is little discussion on binary data clustering mining such as miRNA-targeted gene data.Here,we propose a novel biclustering method for miRNA-targeted gene data based on graph autoencoder named as GAEBic.GAEBic applies graph autoencoder to capture the similarity of sample sets or variable sets,and takes a new irregular clustering strategy to mine biclusters with excellent generalization.Based on the miRNA-targeted gene data of soybean,we benchmark several different types of the biclustering algorithm,and find that GAEBic performs better than Bimax,Bibit and the Spectral Biclustering algorithm in terms of target gene enrichment.This biclustering method achieves comparable performance on the high throughput miRNA data of soybean and it can also be used for other species.展开更多
Objective:To find an appropriate feature representation in the biclustering of symptom-herb relationship in Chinese medicine(CM).Methods: Four different representation schemes were tested in identifying the comple...Objective:To find an appropriate feature representation in the biclustering of symptom-herb relationship in Chinese medicine(CM).Methods: Four different representation schemes were tested in identifying the complex relationship between symptoms and herbs using a biclustering algorithm on an insomnia data set.These representation schemes were effective count,binary value,relative success ratio,or modified relative success ratio.The comparison of the schemes was made on the number and size of biclusters with respect to different threshold values.Results and Conclusions:The modified relative success ratio scheme was the most appropriate feature representation among the four tested.Some of the biclusters selected from this representation scheme were known to follow the therapeutic principles of CM,while others may offer clues for further clinical investigations.展开更多
Biclustering is a method of grouping objects and attributes simultaneously in order to find multiple hidden patterns.When dealing with a long time series,there is a low possibility of finding meaningful clusters of wh...Biclustering is a method of grouping objects and attributes simultaneously in order to find multiple hidden patterns.When dealing with a long time series,there is a low possibility of finding meaningful clusters of whole time sequence.However,we may find more significant clusters containing partial time sequence by applying a biclustering method.This paper proposed a new biclustering algorithm for time series data following an autoregressive moving average (ARMA) model.We assumed the plaid model but modified the algorithm to incorporate the sequential nature of time series data.The maximum likelihood estimation (MLE) method was used to estimate coefficients of ARMA in each bicluster.We applied the proposed method to several synthetic data which were generated from different ARMA orders.Results from the experiments showed that the proposed method compares favorably with other biclustering methods for time series data.展开更多
Cheng and Church algorithm is an important approach in biclustering algorithms. In this paper, the process of the extended space in the second stage of Cheng and Church algorithm is improved and the selections of two ...Cheng and Church algorithm is an important approach in biclustering algorithms. In this paper, the process of the extended space in the second stage of Cheng and Church algorithm is improved and the selections of two important parameters are discussed. The results of the improved algorithm used in the gene expression spectrum analysis show that, compared with Cheng and Church algorithm, the quality of clustering results is enhanced obviously, the mining expression models are better, and the data possess a strong consistency with fluctuation on the condition while the computational time does not increase significantly.展开更多
Currently, genome-wide association studies have been proved to be a powerful approach to identify risk loci. However, the molecular regulatory mechanisms of complex diseases are still not clearly understood. It is the...Currently, genome-wide association studies have been proved to be a powerful approach to identify risk loci. However, the molecular regulatory mechanisms of complex diseases are still not clearly understood. It is therefore important to consider the interplay between genetic factors and biological networks in elucidating the mechanisms of complex disease pathogenesis. In this paper, we first conducted a genome-wide association analysis by using the SNP genotype data and phenotype data provided by Genetic Analysis Workshop 17, in order to filter significant SNPs associated with the diseases. Second, we conducted a bioinformatics analysis of gene-phenotype association matrix to identify gene modules (biclusters). Third, we performed a KEGG enrichment test of genes involved in biclusters to find evidence to support their functional consensus. This method can be used for better understanding complex diseases.展开更多
Background Although clinical treatment for heart failure and sudden death has been improved over the last few decades, the morbidity and mortality of dilated cardiomyopathy (DCM) have increased. So a better understa...Background Although clinical treatment for heart failure and sudden death has been improved over the last few decades, the morbidity and mortality of dilated cardiomyopathy (DCM) have increased. So a better understanding of the underlying molecular events leading to DCM is urgent. Persistent viral infection (especially coxsackievirus group B3) of the myocardium in viral myocarditis and DCM has never been neglected by experts. Recent data indicate that the up-regulation of coxsackievirus and adenovirus receptor (CAR) in viral cardiomyopathy contributes to viral infection as a key factor in the pathogenesis of this disease. This study aimed to investigate the role and regulatory mechanism of CAR in DCM by the bioinformatic method. Methods We identified the clusters of genes co-expressed with CAR by clustering algorithm based on the public available microarray dataset of DCM (Kittleson, et al. 2005), and mapped these genes into the protein-protein interaction networks to investigate the interaction relationship to each other at the protein level after confirming that the samples are characterized by the cluster of genes in correctly partitioning. Results The gene cluster GENESET 11 containing 33 genes including CAR with similar expression pattern was identified by cluster algorithm, of which 19 genes were found to have interaction information of the protein encoded by them in the current human protein interaction database. Especially, 12 genes present as critical nodes (called HUB node) at the protein level are involved in energy metabolism, signal transduction, viral infection, immuno-response, cell apoptosis, cell proliferation, tissue repair, etc. Conclusions The genes in GENESET 11 together with CAR may play a pathogenic role in the development of DCM, mainly involved in the mechanism of energy metabolism, signal transduction, viral infection, immuno-response, cell apoptosis and tissue repair.展开更多
In complex multivariate data sets,different features usually include diverse associations with different variables,and different variables are associated within different regions.Therefore,exploring the associations b...In complex multivariate data sets,different features usually include diverse associations with different variables,and different variables are associated within different regions.Therefore,exploring the associations between variables and voxels locally becomes necessary to better understand the underlying phenomena.In this paper,we propose a co-analysis framework based on biclusters,which are two subsets of variables and voxels with close scalar-value relationships,to guide the process of visually exploring multivariate data.We first automatically extract all meaningful biclusters,each of which only contains voxels with a similar scalar-value pattern over a subset of variables.These biclusters are organized according to their variable sets,and biclusters in each variable set are further grouped by a similarity metric to reduce redundancy and support diversity during visual exploration.Biclusters are visually represented in coordinated views to facilitate interactive exploration of multivariate data from the similarity between biclusters and the correlation of scalar values with different variables.Experiments on several representative multivariate scientific data sets demonstrate the effectiveness of our framework in exploring local relationships among variables,biclusters and scalar values in the data.展开更多
基金supported by the National Natural Science Foundation of China,No.81471308(to JL)the Stem Cell Clinical Research Project in China,No.CMR-20161129-1003(to JL)the Innovation Technology Funding of Dalian in China,No.2018J11CY025(to JL)
文摘Neural stem cells,which are capable of multi-potential differentiation and self-renewal,have recently been shown to have clinical potential for repairing central nervous system tissue damage.However,the theme trends and knowledge structures for human neural stem cells have not yet been studied bibliometrically.In this study,we retrieved 2742 articles from the PubMed database from 2013 to 2018 using "Neural Stem Cells" as the retrieval word.Co-word analysis was conducted to statistically quantify the characteristics and popular themes of human neural stem cell-related studies.Bibliographic data matrices were generated with the Bibliographic Item Co-Occurrence Matrix Builder.We identified 78 high-frequency Medical Subject Heading(MeSH)terms.A visual matrix was built with the repeated bisection method in gCLUTO software.A social network analysis network was generated with Ucinet 6.0 software and GraphPad Prism 5 software.The analyses demonstrated that in the 6-year period,hot topics were clustered into five categories.As suggested by the constructed strategic diagram,studies related to cytology and physiology were well-developed,whereas those related to neural stem cell applications,tissue engineering,metabolism and cell signaling,and neural stem cell pathology and virology remained immature.Neural stem cell therapy for stroke and Parkinson’s disease,the genetics of microRNAs and brain neoplasms,as well as neuroprotective agents,Zika virus,Notch receptor,neural crest and embryonic stem cells were identified as emerging hot spots.These undeveloped themes and popular topics are potential points of focus for new studies on human neural stem cells.
文摘Analysis of gene expression data can help to find the time-lagged co-regulation of gene cluster. However, existing method just solve the problem under the condition when the data is discrete number. In this paper, we propose efficient algorithm to indentify time-lagged co-regulated gene cluster based on real number.
基金supported by China Scholarship Council,Guangdong Science and Technology Department under Grant no.2016A010101020,2016A010101021,2016A010101022Guangzhou Science and Information Bureau under Grant no 201802010033.
文摘Microarray contains a large matrix of information and has been widely used by biologists and bio data scientist for monitoring combinations of genes in different organisms.The coherent patterns in all continuous columns are mined in gene microarray data matrices.It is investigated,in this study,the coherent patterns in all continuous columns in gene microarray data matrix by developing the time series similarity measure for the coherent patterns in all continuous columns,as well as the evaluation function for verifying the proposed algorithm and the corresponding biclusters.The continuous time changes are taken into account in the coherent patterns in all continuous columns,and co-expression patterns in time series are searched.In order to use all the common information between sequences,a similarity measure for the coherent patterns in continuous columns is defined in this paper.To validate the efficiency of the similarity measure to mine biological information at continuous time points,an evaluation function is defined to measure biclusters,and an effective algorithm is proposed to mine the biclusters.Simulation experiments are conducted to verify the biological significance of the biclusters,which include synthetic datasets and real gene microarray datasets.The performance of the algorithm is analyzed,and the results show that the algorithm is highly efficient.
基金sponsored by the Natural Science Foundation of Shanghai(No.20ZR1427800)the New Young Teachers Launch Program of Shanghai Jiaotong University,China(No.20X100040036)+1 种基金the National Natural Science Foundation of China(No.61971273)the Development Program in Shaanxi Province of China(No.2021GY-032)。
文摘Commercial aircraft crews have experienced a trend from five-person crew to dual-pilot crew.Arised from both technological and market demands,Single Pilot Operations(SPO)is considered an important development direction in modern aviation technology.In this paper,starting from Dual-Pilot Operations(DPO),the piloting process,decision-making process and decisionmaking mode of DPO for commercial aircraft are studied to obtain the operational requirements of SPO.Then,based on above analysis,the operational mechanism of SPO is studied and the core technology of SPO mode is proposed.Next,a new closed frequent bicluster mining algorithm named FsCluster is proposed for the optimization of the SPO model,and the other efficient bicluster mining algorithm named TsCluster is proposed for the analysis and verification of the SPO model.Finally,a typical flight phase scenario is modelled by Magic System of System,and combined with the proposed algorithms for analysis and verification to determine whether the SPO mode can meet the DPO requirements.
基金funded in part by Israeli Science Foundation under Grant No.1227/09by a grant to Amichai Painsky fromthe Israeli Center for Absorption in Science
文摘The availability of large microarray data has led to a growing interest in biclustering methods in the past decade. Several algorithms have been proposed to identify subsets of genes and conditions according to different similarity measures and under varying constraints. In this paper we focus on the exclusive row biclustering problem (also known as projected clustering) for gene expression, in which each row can only be a member of a single bicluster while columns can participate in multiple clusters. This type of biclustering may be adequate, for example, for clustering groups of cancer patients where each patient (row) is expected to be carrying only a single type of cancer, while each cancer type is associated with multiple (and possibly overlapping) genes (columns). We present a novel method to identify these exclusive row biclusters in the spirit of the optimal set cover problem. We present our algorithmic solution as a combination of existing biclustering algorithms and combinatorial auction techniques. Furthermore, we devise an approach for tuning the threshold of our algorithm based on comparison with a null model, inspired by the Gap statistic approach. We demonstrate our approach on both synthetic and real world gene expression data and show its power in identifying large span non-overlapping rows submatrices, while considering their unique nature.
基金supported by National Program on Key Basic Research Project(2014CB744903)National Natural Science Foundation of China(61673270)+5 种基金Natural Science Foundation of Shanghai(20ZR1427800)New Young Teachers Launch Program of Shanghai Jiaotong University(20X100040036)Shanghai Pujiang Program(16PJD028)Shanghai Industrial Strengthening Project(GYQJ-2017-5-08)Shanghai Science and Technology Committee Research Project(17DZ1204304)Shanghai Engineering Research Center of Civil Aircraft Flight Testing。
文摘With the continuous advancement of the avionics system,crew members are correspondingly reduced,and Single Pilot Operations(SPO)has attracted widespread attention from scholars.To meet the flight requirements in SPO mode,it is necessary to further strengthen air-ground coordination system integration,but at the same time,there will be some safety issues caused by resource integration,function fusion,and task synthesis.Aimed at the safety problems caused by task synthesis,an efficient differential bicluster mining algorithm--DFCluster algorithm is proposed in this paper to discover potential hazardous elements or propagation mechanisms through mining the resource-function matrixes.To mine efficiently,several pruning techniques are designed for generating maximal biclusters without candidate maintenance.The experimental results show that the DFCluster algorithm is more efficient than the existing differential biclustering algorithms under different scales of artificial datasets and public datasets.Then,a typical flight scenario is designed based on SPO air-ground collaborative system architecture,and combined with our proposed DFCluster algorithm for task synthesis safety analysis.Based on the mining results,the SPO airground collaborative system architecture is modified,which ultimately improves the safety of the SPO system.
文摘Background: Developing appropriate computational tools to distill biological insights from large-scale gene expression data has been an important part of systems biology. Considering that gene relationships may change or only exist in a subset of collected samples, biclustering that involves clustering both genes and samples has become in- creasingly important, especially when the samples are pooled from a wide range of experimental conditions. Methods: In this paper, we introduce a new biclustering algorithm to find subsets of genomic expression features (EFs) (e.g., genes, isoforms, exon inclusion) that show strong "group interactions" under certain subsets of samples. Group interactions are defined by strong partial correlations, or equivalently, conditional dependencies between EFs after removing the influences of a set of other functionally related EFs. Our new biclustering method, named SCCA-BC, extends an existing method for group interaction inference, which is based on sparse canonical correlation analysis (SCCA) coupled with repeated random partitioning of the gene expression data set. Results: SCCA-BC gives sensible results on real data sets and outperforms most existing methods in simulations. Software is available at https://github.com/pimentel/scca-bc. Conclusions: SCCA-BC seems to work in numerous conditions and the results seem promising for future extensions. SCCA-BC has the ability to find different types of bicluster patterns, and it is especially advantageous in identifying a bicluster whose elements share the same progressive and multivariate normal distribution with a dense covariance matrix.
基金This work was supported by Ningbo Natural Science Foundation(No.202003N4057)the National Natural Science Foundation of China(Nos.62172336 and 62032018).
文摘The massive growth of online commercial data has raised the request for an automatic recommender system to benefit both users and merchants.One of the most frequently used recommendation methods is collaborative filtering,but its accuracy is limited by the sparsity of the rating dataset.Most existing collaborative filtering methods consider all features when calculating user/item similarity and ignore much local information.In collaborative filtering,selecting neighbors and determining users’similarities are the most important parts.For the selection of better neighbors,this study proposes a novel biclustering method based on modified fuzzy adaptive resonance theory.To reflect the similarity between users,a new measure that considers the effect of the number of users’common items is proposed.Specifically,the proposed novel biclustering method is first adopted to obtain local similarity and local prediction.Second,item-based collaborative filtering is used to generate global predictions.Finally,the two resultant predictions are fused to obtain a final one.Experiment results demonstrate that the proposed method outperforms state-of-the-art models in terms of several aspects on three benchmark datasets.
基金supported by the National Research Foundation of Korea (NRF) funded by the Ministry of Education,Science and Technology (MEST) of Korea under Grant No. 2011-0016648
文摘A biclustering algorithm extends conventional clustering techniques to extract all of the meaningful subgroups of genes and conditions in the expression matrix of a microarray dataset. However, such algorithms are very sensitive to input parameters and show poor scalability. This paper proposes a scalable unsupervised biclustering framework, SUBic, to find high quality constant-row biclusters in an expression matrix effectively. A one-dimensional clustering algorithm is proposed to partition the attributes, that is, columns of an expression matrix into disjoint groups based on the similarity of expression values. These groups form a set of short transactions and are used to discover a set of frequent itemsets each of which corresponds to a bicluster. However, a bicluster may include any attribute whose expression value is not similar enough to others, so a bicluster refinement is used to enhance the quality of a bicluster by removing those attributes based on its distribution of expression values. The performance of the proposed method is comparatively analyzed through a series of experiments on synthetic and real datasets.
基金This work was supported by the National Natural Science Foundation of China under Grant No.62072210the Project of the Development and Reform Commission of Jilin Province of China under Grant No.2019C053-6.
文摘Unlike traditional clustering analysis,the biclustering algorithm works simultaneously on two dimensions of samples(row)and variables(column).In recent years,biclustering methods have been developed rapidly and widely applied in biological data analysis,text clustering,recommendation system and other fields.The traditional clustering algorithms cannot be well adapted to process high-dimensional data and/or large-scale data.At present,most of the biclustering algorithms are designed for the differentially expressed big biological data.However,there is little discussion on binary data clustering mining such as miRNA-targeted gene data.Here,we propose a novel biclustering method for miRNA-targeted gene data based on graph autoencoder named as GAEBic.GAEBic applies graph autoencoder to capture the similarity of sample sets or variable sets,and takes a new irregular clustering strategy to mine biclusters with excellent generalization.Based on the miRNA-targeted gene data of soybean,we benchmark several different types of the biclustering algorithm,and find that GAEBic performs better than Bimax,Bibit and the Spectral Biclustering algorithm in terms of target gene enrichment.This biclustering method achieves comparable performance on the high throughput miRNA data of soybean and it can also be used for other species.
文摘Objective:To find an appropriate feature representation in the biclustering of symptom-herb relationship in Chinese medicine(CM).Methods: Four different representation schemes were tested in identifying the complex relationship between symptoms and herbs using a biclustering algorithm on an insomnia data set.These representation schemes were effective count,binary value,relative success ratio,or modified relative success ratio.The comparison of the schemes was made on the number and size of biclusters with respect to different threshold values.Results and Conclusions:The modified relative success ratio scheme was the most appropriate feature representation among the four tested.Some of the biclusters selected from this representation scheme were known to follow the therapeutic principles of CM,while others may offer clues for further clinical investigations.
基金Project (No.2010-0016800) supported by the Basic Science Research Program through the National Research Foundation (NRF) funded by the Ministry of Education,Science and Technology,Korea
文摘Biclustering is a method of grouping objects and attributes simultaneously in order to find multiple hidden patterns.When dealing with a long time series,there is a low possibility of finding meaningful clusters of whole time sequence.However,we may find more significant clusters containing partial time sequence by applying a biclustering method.This paper proposed a new biclustering algorithm for time series data following an autoregressive moving average (ARMA) model.We assumed the plaid model but modified the algorithm to incorporate the sequential nature of time series data.The maximum likelihood estimation (MLE) method was used to estimate coefficients of ARMA in each bicluster.We applied the proposed method to several synthetic data which were generated from different ARMA orders.Results from the experiments showed that the proposed method compares favorably with other biclustering methods for time series data.
基金This work was supported by the National Natural Science Foundation of China(No.60433020)the Doctoral Funds of the Ministry of Education of China(No.20030183060)+1 种基金the Science-Technology Development Project of Jilin Province of China(No.20050705-2)the“985”Project of Jilin University.
文摘Cheng and Church algorithm is an important approach in biclustering algorithms. In this paper, the process of the extended space in the second stage of Cheng and Church algorithm is improved and the selections of two important parameters are discussed. The results of the improved algorithm used in the gene expression spectrum analysis show that, compared with Cheng and Church algorithm, the quality of clustering results is enhanced obviously, the mining expression models are better, and the data possess a strong consistency with fluctuation on the condition while the computational time does not increase significantly.
文摘Currently, genome-wide association studies have been proved to be a powerful approach to identify risk loci. However, the molecular regulatory mechanisms of complex diseases are still not clearly understood. It is therefore important to consider the interplay between genetic factors and biological networks in elucidating the mechanisms of complex disease pathogenesis. In this paper, we first conducted a genome-wide association analysis by using the SNP genotype data and phenotype data provided by Genetic Analysis Workshop 17, in order to filter significant SNPs associated with the diseases. Second, we conducted a bioinformatics analysis of gene-phenotype association matrix to identify gene modules (biclusters). Third, we performed a KEGG enrichment test of genes involved in biclusters to find evidence to support their functional consensus. This method can be used for better understanding complex diseases.
基金This work was supported in part by the National High Tech Development Project of China, the 863 Program (No. 2007AA02Z329), the National Natural Science Foundation of China (No. 30370798, 30571034, and 30570424) and the Key Project of Heilongjiang Province (No. GB07C32402)
文摘Background Although clinical treatment for heart failure and sudden death has been improved over the last few decades, the morbidity and mortality of dilated cardiomyopathy (DCM) have increased. So a better understanding of the underlying molecular events leading to DCM is urgent. Persistent viral infection (especially coxsackievirus group B3) of the myocardium in viral myocarditis and DCM has never been neglected by experts. Recent data indicate that the up-regulation of coxsackievirus and adenovirus receptor (CAR) in viral cardiomyopathy contributes to viral infection as a key factor in the pathogenesis of this disease. This study aimed to investigate the role and regulatory mechanism of CAR in DCM by the bioinformatic method. Methods We identified the clusters of genes co-expressed with CAR by clustering algorithm based on the public available microarray dataset of DCM (Kittleson, et al. 2005), and mapped these genes into the protein-protein interaction networks to investigate the interaction relationship to each other at the protein level after confirming that the samples are characterized by the cluster of genes in correctly partitioning. Results The gene cluster GENESET 11 containing 33 genes including CAR with similar expression pattern was identified by cluster algorithm, of which 19 genes were found to have interaction information of the protein encoded by them in the current human protein interaction database. Especially, 12 genes present as critical nodes (called HUB node) at the protein level are involved in energy metabolism, signal transduction, viral infection, immuno-response, cell apoptosis, cell proliferation, tissue repair, etc. Conclusions The genes in GENESET 11 together with CAR may play a pathogenic role in the development of DCM, mainly involved in the mechanism of energy metabolism, signal transduction, viral infection, immuno-response, cell apoptosis and tissue repair.
基金This work was supported by the National Key Research&Development Program of China(2017YFB0202203)National Natural Science Foundation of China(61472354 and 61672452)NSFC-Guangdong Joint Fund(U1611263).
文摘In complex multivariate data sets,different features usually include diverse associations with different variables,and different variables are associated within different regions.Therefore,exploring the associations between variables and voxels locally becomes necessary to better understand the underlying phenomena.In this paper,we propose a co-analysis framework based on biclusters,which are two subsets of variables and voxels with close scalar-value relationships,to guide the process of visually exploring multivariate data.We first automatically extract all meaningful biclusters,each of which only contains voxels with a similar scalar-value pattern over a subset of variables.These biclusters are organized according to their variable sets,and biclusters in each variable set are further grouped by a similarity metric to reduce redundancy and support diversity during visual exploration.Biclusters are visually represented in coordinated views to facilitate interactive exploration of multivariate data from the similarity between biclusters and the correlation of scalar values with different variables.Experiments on several representative multivariate scientific data sets demonstrate the effectiveness of our framework in exploring local relationships among variables,biclusters and scalar values in the data.