Linking similar proteins structurally is a challenging task that may help in finding the novel members of a protein family. In this respect, identification of conserved sequence can facilitate understanding and classi...Linking similar proteins structurally is a challenging task that may help in finding the novel members of a protein family. In this respect, identification of conserved sequence can facilitate understanding and classifying the exact role of proteins. However, the exact role of these conserved elements cannot be elucidated without structural and physiochemical information. In this work, we present a novel desktop application MotViz designed for searching and analyzing the conserved sequence segments within protein structure. With MotViz, the user can extract a complete list of sequence motifs from loaded 3D structures, annotate the motifs structurally and analyze their physiochemical properties. The conservation value calculated for an individual motif can be visualized graphically. To check the efficiency, predicted motifs from the data sets of 9 protein families were analyzed and Mot^z algorithm was more efficient in comparison to other online motif prediction tools. Furthermore, a database was also integrated for storing, retrieving and performing the detailed functional annotation studies. In summary, MotViz effectively predicts motifs with high sensitivity and simultaneously visualizes them into 3D strucures. Moreover, Mot- V/z is user-friendly with optimized graphical parameters and better processing speed due to the inclusion of a database at the back end. MotViz is available at http://www.fi-pk.corn/motviz.html.展开更多
Motif-based graph local clustering(MGLC)is a popular method for graph mining tasks due to its various applications.However,the traditional two-phase approach of precomputing motif weights before performing local clust...Motif-based graph local clustering(MGLC)is a popular method for graph mining tasks due to its various applications.However,the traditional two-phase approach of precomputing motif weights before performing local clustering loses locality and is impractical for large graphs.While some attempts have been made to address the efficiency bottleneck,there is still no applicable algorithm for large scale graphs with billions of edges.In this paper,we propose a purely local and index-free method called Index-free Triangle-based Graph Local Clustering(TGLC^(*))to solve the MGLC problem w.r.t.a triangle.TGLC^(*)directly estimates the Personalized PageRank(PPR)vector using random walks with the desired triangleweighted distribution and proposes the clustering result using a standard sweep procedure.We demonstrate TGLC^(*)’s scalability through theoretical analysis and its practical benefits through a novel visualization layout.TGLC^(*)is the first algorithm to solve the MGLC problem without precomputing the motif weight.Extensive experiments on seven real-world large-scale datasets show that TGLC^(*)is applicable and scalable for large graphs.展开更多
基金supported by Higher Education Commission, Pakistan (Grants No. 20-1493/R&D/09)
文摘Linking similar proteins structurally is a challenging task that may help in finding the novel members of a protein family. In this respect, identification of conserved sequence can facilitate understanding and classifying the exact role of proteins. However, the exact role of these conserved elements cannot be elucidated without structural and physiochemical information. In this work, we present a novel desktop application MotViz designed for searching and analyzing the conserved sequence segments within protein structure. With MotViz, the user can extract a complete list of sequence motifs from loaded 3D structures, annotate the motifs structurally and analyze their physiochemical properties. The conservation value calculated for an individual motif can be visualized graphically. To check the efficiency, predicted motifs from the data sets of 9 protein families were analyzed and Mot^z algorithm was more efficient in comparison to other online motif prediction tools. Furthermore, a database was also integrated for storing, retrieving and performing the detailed functional annotation studies. In summary, MotViz effectively predicts motifs with high sensitivity and simultaneously visualizes them into 3D strucures. Moreover, Mot- V/z is user-friendly with optimized graphical parameters and better processing speed due to the inclusion of a database at the back end. MotViz is available at http://www.fi-pk.corn/motviz.html.
基金supported by the Fundamental Research Funds for the Central Universities(No.2020JS005).
文摘Motif-based graph local clustering(MGLC)is a popular method for graph mining tasks due to its various applications.However,the traditional two-phase approach of precomputing motif weights before performing local clustering loses locality and is impractical for large graphs.While some attempts have been made to address the efficiency bottleneck,there is still no applicable algorithm for large scale graphs with billions of edges.In this paper,we propose a purely local and index-free method called Index-free Triangle-based Graph Local Clustering(TGLC^(*))to solve the MGLC problem w.r.t.a triangle.TGLC^(*)directly estimates the Personalized PageRank(PPR)vector using random walks with the desired triangleweighted distribution and proposes the clustering result using a standard sweep procedure.We demonstrate TGLC^(*)’s scalability through theoretical analysis and its practical benefits through a novel visualization layout.TGLC^(*)is the first algorithm to solve the MGLC problem without precomputing the motif weight.Extensive experiments on seven real-world large-scale datasets show that TGLC^(*)is applicable and scalable for large graphs.