Recent years have seen an explosion in graph data from a variety of scientific,social and technological fields.From these fields,emotion recognition is an interesting research area because it finds many applications i...Recent years have seen an explosion in graph data from a variety of scientific,social and technological fields.From these fields,emotion recognition is an interesting research area because it finds many applications in real life such as in effective social robotics to increase the interactivity of the robot with human,driver safety during driving,pain monitoring during surgery etc.A novel facial emotion recognition based on graph mining has been proposed in this paper to make a paradigm shift in the way of representing the face region,where the face region is represented as a graph of nodes and edges and the gSpan frequent sub-graphs mining algorithm is used to find the frequent sub-structures in the graph database of each emotion.To reduce the number of generated sub-graphs,overlap ratio metric is utilized for this purpose.After encoding the final selected sub-graphs,binary classification is then applied to classify the emotion of the queried input facial image using six levels of classification.Binary cat swarm intelligence is applied within each level of classification to select proper sub-graphs that give the highest accuracy in that level.Different experiments have been conducted using Surrey Audio-Visual Expressed Emotion(SAVEE)database and the final system accuracy was 90.00%.The results show significant accuracy improvements(about 2%)by the proposed system in comparison to current published works in SAVEE database.展开更多
Human verification and activity analysis(HVAA)are primarily employed to observe,track,and monitor human motion patterns using redgreen-blue(RGB)images and videos.Interpreting human interaction using RGB images is one ...Human verification and activity analysis(HVAA)are primarily employed to observe,track,and monitor human motion patterns using redgreen-blue(RGB)images and videos.Interpreting human interaction using RGB images is one of the most complex machine learning tasks in recent times.Numerous models rely on various parameters,such as the detection rate,position,and direction of human body components in RGB images.This paper presents robust human activity analysis for event recognition via the extraction of contextual intelligence-based features.To use human interaction image sequences as input data,we first perform a few denoising steps.Then,human-to-human analyses are employed to deliver more precise results.This phase follows feature engineering techniques,including diverse feature selection.Next,we used the graph mining method for feature optimization and AdaBoost for classification.We tested our proposed HVAA model on two benchmark datasets.The testing of the proposed HVAA system exhibited a mean accuracy of 92.15%for the Sport Videos in theWild(SVW)dataset.The second benchmark dataset,UT-interaction,had a mean accuracy of 92.83%.Therefore,these results demonstrated a better recognition rate and outperformed other novel techniques in body part tracking and event detection.The proposed HVAA system can be utilized in numerous real-world applications including,healthcare,surveillance,task monitoring,atomic actions,gesture and posture analysis.展开更多
The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer s...The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer set. These tables are implemented using column-based techniques and are used to store graphs of database, frequent sub-graphs and the neighborhood of nodes. In order to exact checking of remaining graphs, the vertex invariant is used for isomorphism test which can be parallel implemented. The results of evaluation indicate that proposed method outperforms existing methods.展开更多
With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this pap...With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this paper,we propose a dependency graph model to describe the relationships between web requests.Based on this model,we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology.We evaluate the proposed algorithm with real massive data.The size of the dataset collected from a mobile core network is 228.7GB.It covers more than three million users.The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods.展开更多
Discovering regularities between entities in temporal graphs is vital for many real-world applications(e.g.,social recommendation,emergency event detection,and cyberattack event detection).This paper proposes temporal...Discovering regularities between entities in temporal graphs is vital for many real-world applications(e.g.,social recommendation,emergency event detection,and cyberattack event detection).This paper proposes temporal graph association rules(TGARs)that extend traditional graph-pattern association rules in a static graph by incorporating the unique temporal information and constraints.We introduce quality measures(e.g.,support,confidence,and diversification)to characterize meaningful TGARs that are useful and diversified.In addition,the proposed support metric is an upper bound for alternative metrics,allowing us to guarantee a superset of patterns.We extend conventional confidence measures in terms of maximal occurrences of TGARs.The diversification score strikes a balance between interestingness and diversity.Although the problem is NP-hard,we develop an effective discovery algorithm for TGARs that integrates TGARs generation and TGARs selection and shows that mining TGARs is feasible over a temporal graph.We propose pruning strategies to filter TGARs that have low support or cannot make top-k as early as possible.Moreover,we design an auxiliary data structure to prune the TGARs that do not meet the constraints during the TGARs generation process to avoid conducting repeated subgraph matching for each extension in the search space.We experimentally verify the effectiveness,efficiency,and scalability of our algorithms in discovering diversified top-k TGARs from temporal graphs in real-life applications.展开更多
The importance of prerequisites for education has recently become a promising research direction.This work proposes a statistical model for measuring dependencies in learning resources between knowledge units.Instruct...The importance of prerequisites for education has recently become a promising research direction.This work proposes a statistical model for measuring dependencies in learning resources between knowledge units.Instructors are expected to present knowledge units in a semantically well-organized manner to facilitate students’understanding of the material.The proposed model reveals how inner concepts of a knowledge unit are dependent on each other and on concepts not in the knowledge unit.To help understand the complexity of the inner concepts themselves,WordNet is included as an external knowledge base in thismodel.The goal is to develop a model that will enable instructors to evaluate whether or not a learning regime has hidden relationships which might hinder students’ability to understand the material.The evaluation,employing three textbooks,shows that the proposed model succeeds in discovering hidden relationships among knowledge units in learning resources and in exposing the knowledge gaps in some knowledge units.展开更多
Large-scale graphs usually exhibit global sparsity with local cohesiveness,and mining the representative cohesive subgraphs is a fundamental problem in graph analysis.The k-truss is one of the most commonly studied co...Large-scale graphs usually exhibit global sparsity with local cohesiveness,and mining the representative cohesive subgraphs is a fundamental problem in graph analysis.The k-truss is one of the most commonly studied cohesive subgraphs,in which each edge is formed in at least k 2 triangles.A critical issue in mining a k-truss lies in the computation of the trussness of each edge,which is the maximum value of k that an edge can be in a k-truss.Existing works mostly focus on truss computation in static graphs by sequential models.However,the graphs are constantly changing dynamically in the real world.We study distributed truss computation in dynamic graphs in this paper.In particular,we compute the trussness of edges based on the local nature of the k-truss in a synchronized node-centric distributed model.Iteratively decomposing the trussness of edges by relying only on local topological information is possible with the proposed distributed decomposition algorithm.Moreover,the distributed maintenance algorithm only needs to update a small amount of dynamic information to complete the computation.Extensive experiments have been conducted to show the scalability and efficiency of the proposed algorithm.展开更多
Complex structural variants(CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants.However,detecting the compounded mutational...Complex structural variants(CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants.However,detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy.As a result,there has been limited progress for CSV discovery compared with simple structural variants.Here,we systematically analyzed the multi-breakpoint connection feature of CSVs,and proposed Mako,utilizing a bottom-up guided model-free strategy,to detect CSVs from paired-end short-read sequencing.Specifically,we implemented a graph-based pattern growth approach,where the graph depicts potential breakpoint connections,and pattern growth enables CSV detection without pre-defined models.Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms.Notably,validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%,where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp,respectively.Moreover,the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types,including two novel types of adjacent segment swap and tandem dispersed duplication.Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs.Mako is publicly available at https://github.com/xjtu-omics/Mako.展开更多
Short message service (SMS) is now becoming an indispensable way of social communication, and the problem of mobile spam is getting increasingly serious. We propose a novel approach for spare messages detection. Ins...Short message service (SMS) is now becoming an indispensable way of social communication, and the problem of mobile spam is getting increasingly serious. We propose a novel approach for spare messages detection. Instead of conventional methods that focus on keywords or flow rate filtering, our system is based on mining under a more robust structure: the social network constructed with SMS. Several features, including static features, dynamic features and graph features, are proposed for describing activities of nodes in the network in various ways. Experimental results operated on real dataset prove the validity of our approach.展开更多
Keyword search is an alternative for structured languages in querying graph-structured data.A result to a keyword query is a connected structure covering all or part of the queried keywords.The textual coverage and st...Keyword search is an alternative for structured languages in querying graph-structured data.A result to a keyword query is a connected structure covering all or part of the queried keywords.The textual coverage and structural compactness have been known as the two main properties of a relevant result to a keyword query.Many previous works examined these properties after retrieving all of the candidate results using a ranking function in a comparative manner.However,this needs a time-consuming search process,which is not appropriate for an interactive system in which the user expects results in the least possible time.This problem has been addressed in recent works by confining the shape of results to examine their coverage and compactness during the search.However,these methods still suffer from the existence of redundant nodes in the retrieved results.In this paper,we introduce the semantic of minimal covered r-clique(MCCr)for the results of a keyword query as an extended model of existing definitions.We propose some efficient algorithms to detect the MCCrs of a given query.These algorithms can retrieve a comprehensive set of non-duplicate MCCrs in response to a keyword query.In addition,these algorithms can be executed in a distributive manner,which makes them outstanding in the field of keyword search.We also propose the approximate versions of these algorithms to retrieve the top-k approximate MCCrs in a polynomial delay.It is proved that the approximate algorithms can retrieve results in two-approximation.Extensive experiments on two real-world datasets confirm the efficiency and effectiveness of the proposed algorithms.展开更多
N-hop neighborhoods information is very useful in analytic tasks on large-scale graphs,like finding clique in a social network,recommending friends or advertising links according to one’s interests,predicting links a...N-hop neighborhoods information is very useful in analytic tasks on large-scale graphs,like finding clique in a social network,recommending friends or advertising links according to one’s interests,predicting links among websites and etc.To get the N-hop neighborhoods information on a large graph,such as a web graph,a twitter social graph,the most straightforward method is to conduct a breadth first search(BFS)on a parallel distributed graph processing framework,such as Pregel and GraphLab.However,due to the massive volume of message transfer,the BFS method results in high communication cost and has low efficiency.In this work,we propose a key/value based method,namely KVB,which perfectly fits into the prevailing parallel graph processing framework and computes N-hop neighborhoods on a large scale graph efficiently.Unlike the BFS method,our method need not transfer large amount of neighborhoods information,thus,significantly reduces the overhead on both the communication and intermediate results in the distributed framework.We formalize the N-hop neighborhoods query processing as an optimization problem based on a set of quantitative cost metrics of parallel graph processing.Moreover,we propose a solution to efficiently load only the relevant neighborhoods for computation.Specially,we prove the optimal partial neighborhoods load problem is NP-hard and carefully design a heuristic strategy.We have implemented our algorithm on a distributed graph frameworkSpark GraphX and validated our solution with extensive experiments over a number of real world and synthetic large graphs on a modest indoor cluster.Experiments show that our solution generally gains an order of magnitude speedup comparing to the state-of-art BFS implementation.展开更多
Identifying composite crosscutting concerns(CCs) is a research task and challenge of aspect mining.In this paper,we propose a scatter-based graph clustering approach to identify composite CCs.Inspired by the state-o...Identifying composite crosscutting concerns(CCs) is a research task and challenge of aspect mining.In this paper,we propose a scatter-based graph clustering approach to identify composite CCs.Inspired by the state-of-the-art link analysis tech-niques,we propose a two-state model to approximate how CCs tangle with core modules.According to this model,we obtain scatter and centralization scores for each program element.Espe-cially,the scatter scores are adopted to select CC seeds.Further-more,to identify composite CCs,we adopt a novel similarity measurement and develop an undirected graph clustering to group these seeds.Finally,we compare it with the previous work and illustrate its effectiveness in identifying composite CCs.展开更多
The distance dynamics model is excellent tool for uncovering the community structure of a complex network. However, one issue that must be addressed by this model is its very long computation time in large-scale netwo...The distance dynamics model is excellent tool for uncovering the community structure of a complex network. However, one issue that must be addressed by this model is its very long computation time in large-scale networks. To identify the community structure of a large-scale network with high speed and high quality, in this paper, we propose a fast community detection algorithm, the F-Attractor, which is based on the distance dynamics model. The main contributions of the F-Attractor are as follows. First, we propose the use of two prejudgment rules from two different perspectives: node and edge. Based on these two rules, we develop a strategy of internal edge prejudgment for predicting the internal edges of the network. Internal edge prejudgment can reduce the number of edges and their neighbors that participate in the distance dynamics model. Second, we introduce a triangle distance to further enhance the speed of the interaction process in the distance dynamics model. This triangle distance uses two known distances to measure a third distance without any extra computation. We combine the above techniques to improve the distance dynamics model and then describe the community detection process of the F-Attractor. The results of an extensive series of experiments demonstrate that the F-Attractor offers high-speed community detection and high partition quality.展开更多
文摘Recent years have seen an explosion in graph data from a variety of scientific,social and technological fields.From these fields,emotion recognition is an interesting research area because it finds many applications in real life such as in effective social robotics to increase the interactivity of the robot with human,driver safety during driving,pain monitoring during surgery etc.A novel facial emotion recognition based on graph mining has been proposed in this paper to make a paradigm shift in the way of representing the face region,where the face region is represented as a graph of nodes and edges and the gSpan frequent sub-graphs mining algorithm is used to find the frequent sub-structures in the graph database of each emotion.To reduce the number of generated sub-graphs,overlap ratio metric is utilized for this purpose.After encoding the final selected sub-graphs,binary classification is then applied to classify the emotion of the queried input facial image using six levels of classification.Binary cat swarm intelligence is applied within each level of classification to select proper sub-graphs that give the highest accuracy in that level.Different experiments have been conducted using Surrey Audio-Visual Expressed Emotion(SAVEE)database and the final system accuracy was 90.00%.The results show significant accuracy improvements(about 2%)by the proposed system in comparison to current published works in SAVEE database.
文摘Human verification and activity analysis(HVAA)are primarily employed to observe,track,and monitor human motion patterns using redgreen-blue(RGB)images and videos.Interpreting human interaction using RGB images is one of the most complex machine learning tasks in recent times.Numerous models rely on various parameters,such as the detection rate,position,and direction of human body components in RGB images.This paper presents robust human activity analysis for event recognition via the extraction of contextual intelligence-based features.To use human interaction image sequences as input data,we first perform a few denoising steps.Then,human-to-human analyses are employed to deliver more precise results.This phase follows feature engineering techniques,including diverse feature selection.Next,we used the graph mining method for feature optimization and AdaBoost for classification.We tested our proposed HVAA model on two benchmark datasets.The testing of the proposed HVAA system exhibited a mean accuracy of 92.15%for the Sport Videos in theWild(SVW)dataset.The second benchmark dataset,UT-interaction,had a mean accuracy of 92.83%.Therefore,these results demonstrated a better recognition rate and outperformed other novel techniques in body part tracking and event detection.The proposed HVAA system can be utilized in numerous real-world applications including,healthcare,surveillance,task monitoring,atomic actions,gesture and posture analysis.
文摘The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer set. These tables are implemented using column-based techniques and are used to store graphs of database, frequent sub-graphs and the neighborhood of nodes. In order to exact checking of remaining graphs, the vertex invariant is used for isomorphism test which can be parallel implemented. The results of evaluation indicate that proposed method outperforms existing methods.
基金supported in part by the Fundamental Research Funds for the Central Universities under Grant No.2013RC0114111 Project of China under Grant No.B08004
文摘With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this paper,we propose a dependency graph model to describe the relationships between web requests.Based on this model,we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology.We evaluate the proposed algorithm with real massive data.The size of the dataset collected from a mobile core network is 228.7GB.It covers more than three million users.The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods.
基金This work was partially supported by the National Key Research and Development Program(No.2018YFB1800203)National Natural Science Foundation of China(No.U19B2024)Postgraduate Scientific Research Innovation Project of Hunan Province(No.CX20210038).
文摘Discovering regularities between entities in temporal graphs is vital for many real-world applications(e.g.,social recommendation,emergency event detection,and cyberattack event detection).This paper proposes temporal graph association rules(TGARs)that extend traditional graph-pattern association rules in a static graph by incorporating the unique temporal information and constraints.We introduce quality measures(e.g.,support,confidence,and diversification)to characterize meaningful TGARs that are useful and diversified.In addition,the proposed support metric is an upper bound for alternative metrics,allowing us to guarantee a superset of patterns.We extend conventional confidence measures in terms of maximal occurrences of TGARs.The diversification score strikes a balance between interestingness and diversity.Although the problem is NP-hard,we develop an effective discovery algorithm for TGARs that integrates TGARs generation and TGARs selection and shows that mining TGARs is feasible over a temporal graph.We propose pruning strategies to filter TGARs that have low support or cannot make top-k as early as possible.Moreover,we design an auxiliary data structure to prune the TGARs that do not meet the constraints during the TGARs generation process to avoid conducting repeated subgraph matching for each extension in the search space.We experimentally verify the effectiveness,efficiency,and scalability of our algorithms in discovering diversified top-k TGARs from temporal graphs in real-life applications.
文摘The importance of prerequisites for education has recently become a promising research direction.This work proposes a statistical model for measuring dependencies in learning resources between knowledge units.Instructors are expected to present knowledge units in a semantically well-organized manner to facilitate students’understanding of the material.The proposed model reveals how inner concepts of a knowledge unit are dependent on each other and on concepts not in the knowledge unit.To help understand the complexity of the inner concepts themselves,WordNet is included as an external knowledge base in thismodel.The goal is to develop a model that will enable instructors to evaluate whether or not a learning regime has hidden relationships which might hinder students’ability to understand the material.The evaluation,employing three textbooks,shows that the proposed model succeeds in discovering hidden relationships among knowledge units in learning resources and in exposing the knowledge gaps in some knowledge units.
基金supported in part by the National Key Research and Development Program of China(No.2020YFB1005900)in part by National Natural Science Foundation of China(No.62122042)in part by Shandong University Multidisciplinary Research and Innovation Team of Young Scholars(No.2020QNQT017)。
文摘Large-scale graphs usually exhibit global sparsity with local cohesiveness,and mining the representative cohesive subgraphs is a fundamental problem in graph analysis.The k-truss is one of the most commonly studied cohesive subgraphs,in which each edge is formed in at least k 2 triangles.A critical issue in mining a k-truss lies in the computation of the trussness of each edge,which is the maximum value of k that an edge can be in a k-truss.Existing works mostly focus on truss computation in static graphs by sequential models.However,the graphs are constantly changing dynamically in the real world.We study distributed truss computation in dynamic graphs in this paper.In particular,we compute the trussness of edges based on the local nature of the k-truss in a synchronized node-centric distributed model.Iteratively decomposing the trussness of edges by relying only on local topological information is possible with the proposed distributed decomposition algorithm.Moreover,the distributed maintenance algorithm only needs to update a small amount of dynamic information to complete the computation.Extensive experiments have been conducted to show the scalability and efficiency of the proposed algorithm.
基金supported by the National Key R&D Program of China(Grant Nos.2018YFC0910400 and 2017YFC0907500)the National Science Foundation of China(Grant Nos.31671372,61702406,and 31701739)+3 种基金the Fundamental Research Funds for the Central Universitiesthe World-Class Universities(Disciplines)the Characteristic Development Guidance Funds for the Central Universitiesthe Shanghai Municipal Science and Technology Major Project(Grant No.2017SHZDZX01)。
文摘Complex structural variants(CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants.However,detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy.As a result,there has been limited progress for CSV discovery compared with simple structural variants.Here,we systematically analyzed the multi-breakpoint connection feature of CSVs,and proposed Mako,utilizing a bottom-up guided model-free strategy,to detect CSVs from paired-end short-read sequencing.Specifically,we implemented a graph-based pattern growth approach,where the graph depicts potential breakpoint connections,and pattern growth enables CSV detection without pre-defined models.Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms.Notably,validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%,where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp,respectively.Moreover,the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types,including two novel types of adjacent segment swap and tandem dispersed duplication.Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs.Mako is publicly available at https://github.com/xjtu-omics/Mako.
基金supported by the National Natural Science Foundation of China under Grant No. 60873158the National Basic Research 973 Program of China under Grant No. 2010CB327902+1 种基金the Fundamental Research Funds for the Central Universities of Chinathe Opening Funding of the State Key Laboratory of Virtual Reality Technology and Systems of China
文摘Short message service (SMS) is now becoming an indispensable way of social communication, and the problem of mobile spam is getting increasingly serious. We propose a novel approach for spare messages detection. Instead of conventional methods that focus on keywords or flow rate filtering, our system is based on mining under a more robust structure: the social network constructed with SMS. Several features, including static features, dynamic features and graph features, are proposed for describing activities of nodes in the network in various ways. Experimental results operated on real dataset prove the validity of our approach.
文摘Keyword search is an alternative for structured languages in querying graph-structured data.A result to a keyword query is a connected structure covering all or part of the queried keywords.The textual coverage and structural compactness have been known as the two main properties of a relevant result to a keyword query.Many previous works examined these properties after retrieving all of the candidate results using a ranking function in a comparative manner.However,this needs a time-consuming search process,which is not appropriate for an interactive system in which the user expects results in the least possible time.This problem has been addressed in recent works by confining the shape of results to examine their coverage and compactness during the search.However,these methods still suffer from the existence of redundant nodes in the retrieved results.In this paper,we introduce the semantic of minimal covered r-clique(MCCr)for the results of a keyword query as an extended model of existing definitions.We propose some efficient algorithms to detect the MCCrs of a given query.These algorithms can retrieve a comprehensive set of non-duplicate MCCrs in response to a keyword query.In addition,these algorithms can be executed in a distributive manner,which makes them outstanding in the field of keyword search.We also propose the approximate versions of these algorithms to retrieve the top-k approximate MCCrs in a polynomial delay.It is proved that the approximate algorithms can retrieve results in two-approximation.Extensive experiments on two real-world datasets confirm the efficiency and effectiveness of the proposed algorithms.
基金supported by the Natural Science Basic Research Plan in Shaanxi Province of China(2017JM6104)the National Natural Science Foundation of China(Grant Nos.61303037,61472321,61732014)+1 种基金the National Key Research and Development Program of China(2018YFB1003403),the National Basic Research Program(973 Program)of China(2012CB316203)the National High Technology Research and Development Program(863 Program)of China(2012AA011004).
文摘N-hop neighborhoods information is very useful in analytic tasks on large-scale graphs,like finding clique in a social network,recommending friends or advertising links according to one’s interests,predicting links among websites and etc.To get the N-hop neighborhoods information on a large graph,such as a web graph,a twitter social graph,the most straightforward method is to conduct a breadth first search(BFS)on a parallel distributed graph processing framework,such as Pregel and GraphLab.However,due to the massive volume of message transfer,the BFS method results in high communication cost and has low efficiency.In this work,we propose a key/value based method,namely KVB,which perfectly fits into the prevailing parallel graph processing framework and computes N-hop neighborhoods on a large scale graph efficiently.Unlike the BFS method,our method need not transfer large amount of neighborhoods information,thus,significantly reduces the overhead on both the communication and intermediate results in the distributed framework.We formalize the N-hop neighborhoods query processing as an optimization problem based on a set of quantitative cost metrics of parallel graph processing.Moreover,we propose a solution to efficiently load only the relevant neighborhoods for computation.Specially,we prove the optimal partial neighborhoods load problem is NP-hard and carefully design a heuristic strategy.We have implemented our algorithm on a distributed graph frameworkSpark GraphX and validated our solution with extensive experiments over a number of real world and synthetic large graphs on a modest indoor cluster.Experiments show that our solution generally gains an order of magnitude speedup comparing to the state-of-art BFS implementation.
基金Supported by the National Pre-research Project (513150601)
文摘Identifying composite crosscutting concerns(CCs) is a research task and challenge of aspect mining.In this paper,we propose a scatter-based graph clustering approach to identify composite CCs.Inspired by the state-of-the-art link analysis tech-niques,we propose a two-state model to approximate how CCs tangle with core modules.According to this model,we obtain scatter and centralization scores for each program element.Espe-cially,the scatter scores are adopted to select CC seeds.Further-more,to identify composite CCs,we adopt a novel similarity measurement and develop an undirected graph clustering to group these seeds.Finally,we compare it with the previous work and illustrate its effectiveness in identifying composite CCs.
基金supported by the National Natural Science Foundation of China(Nos.61573299,61174140,61472127,and 61272395)the Social Science Foundation of Hunan Province(No.16ZDA07)+2 种基金China Postdoctoral Science Foundation(Nos.2013M540628and 2014T70767)the Natural Science Foundation of Hunan Province(Nos.14JJ3107 and 2017JJ5064)the Excellent Youth Scholars Project of Hunan Province(No.15B087)
文摘The distance dynamics model is excellent tool for uncovering the community structure of a complex network. However, one issue that must be addressed by this model is its very long computation time in large-scale networks. To identify the community structure of a large-scale network with high speed and high quality, in this paper, we propose a fast community detection algorithm, the F-Attractor, which is based on the distance dynamics model. The main contributions of the F-Attractor are as follows. First, we propose the use of two prejudgment rules from two different perspectives: node and edge. Based on these two rules, we develop a strategy of internal edge prejudgment for predicting the internal edges of the network. Internal edge prejudgment can reduce the number of edges and their neighbors that participate in the distance dynamics model. Second, we introduce a triangle distance to further enhance the speed of the interaction process in the distance dynamics model. This triangle distance uses two known distances to measure a third distance without any extra computation. We combine the above techniques to improve the distance dynamics model and then describe the community detection process of the F-Attractor. The results of an extensive series of experiments demonstrate that the F-Attractor offers high-speed community detection and high partition quality.