Sparse large-scale multi-objective optimization problems(SLMOPs)are common in science and engineering.However,the large-scale problem represents the high dimensionality of the decision space,requiring algorithms to tr...Sparse large-scale multi-objective optimization problems(SLMOPs)are common in science and engineering.However,the large-scale problem represents the high dimensionality of the decision space,requiring algorithms to traverse vast expanse with limited computational resources.Furthermore,in the context of sparse,most variables in Pareto optimal solutions are zero,making it difficult for algorithms to identify non-zero variables efficiently.This paper is dedicated to addressing the challenges posed by SLMOPs.To start,we introduce innovative objective functions customized to mine maximum and minimum candidate sets.This substantial enhancement dramatically improves the efficacy of frequent pattern mining.In this way,selecting candidate sets is no longer based on the quantity of nonzero variables they contain but on a higher proportion of nonzero variables within specific dimensions.Additionally,we unveil a novel approach to association rule mining,which delves into the intricate relationships between non-zero variables.This novel methodology aids in identifying sparse distributions that can potentially expedite reductions in the objective function value.We extensively tested our algorithm across eight benchmark problems and four real-world SLMOPs.The results demonstrate that our approach achieves competitive solutions across various challenges.展开更多
It is of great significance to improve the efficiency of railway production and operation by realizing the fault knowledge association through the efficient data mining algorithm.However,high utility quantitative freq...It is of great significance to improve the efficiency of railway production and operation by realizing the fault knowledge association through the efficient data mining algorithm.However,high utility quantitative frequent pattern mining algorithms in the field of data mining still suffer from the problems of low time-memory performance and are not easy to scale up.In the context of such needs,we propose a related degree-based frequent pattern mining algorithm,named Related High Utility Quantitative Item set Mining(RHUQI-Miner),to enable the effective mining of railway fault data.The algorithm constructs the item-related degree structure of fault data and gives a pruning optimization strategy to find frequent patterns with higher related degrees,reducing redundancy and invalid frequent patterns.Subsequently,it uses the fixed pattern length strategy to modify the utility information of the item in the mining process so that the algorithm can control the length of the output frequent pattern according to the actual data situation and further improve the performance and practicability of the algorithm.The experimental results on the real fault dataset show that RHUQI-Miner can effectively reduce the time and memory consumption in the mining process,thus providing data support for differentiated and precise maintenance strategies.展开更多
A recommender system is an approach performed by e-commerce for increasing smooth users’experience.Sequential pattern mining is a technique of data mining used to identify the co-occurrence relationships by taking in...A recommender system is an approach performed by e-commerce for increasing smooth users’experience.Sequential pattern mining is a technique of data mining used to identify the co-occurrence relationships by taking into account the order of transactions.This work will present the implementation of sequence pattern mining for recommender systems within the domain of e-com-merce.This work will execute the Systolic tree algorithm for mining the frequent patterns to yield feasible rules for the recommender system.The feature selec-tion's objective is to pick a feature subset having the least feature similarity as well as highest relevancy with the target class.This will mitigate the feature vector's dimensionality by eliminating redundant,irrelevant,or noisy data.This work pre-sents a new hybrid recommender system based on optimized feature selection and systolic tree.The features were extracted using Term Frequency-Inverse Docu-ment Frequency(TF-IDF),feature selection with the utilization of River Forma-tion Dynamics(RFD),and the Particle Swarm Optimization(PSO)algorithm.The systolic tree is used for pattern mining,and based on this,the recommendations are given.The proposed methods were evaluated using the MovieLens dataset,and the experimental outcomes confirmed the efficiency of the techniques.It was observed that the RFD feature selection with systolic tree frequent pattern mining with collaborativefiltering,the precision of 0.89 was achieved.展开更多
The rapid development of network technology and its evolution toward heterogeneous networks has increased the demand to support automatic monitoring and the management of heterogeneous wireless communication networks....The rapid development of network technology and its evolution toward heterogeneous networks has increased the demand to support automatic monitoring and the management of heterogeneous wireless communication networks.This paper presents a multilevel pattern mining architecture to support automatic network management by discovering interesting patterns from telecom network monitoring data.This architecture leverages and combines existing frequent itemset discovery over data streams,association rule deduction,frequent sequential pattern mining,and frequent temporal pattern mining techniques while also making use of distributed processing platforms to achieve high-volume throughput.展开更多
The identification of design pattern instances is important for program understanding and software maintenance. Aiming at the mining of design patterns in existing systems, this paper proposes a subgraph isomorphism a...The identification of design pattern instances is important for program understanding and software maintenance. Aiming at the mining of design patterns in existing systems, this paper proposes a subgraph isomorphism approach to discover several design patterns in a legacy system at a time. The attributed relational graph is used to describe design patterns and legacy systems. The sub-graph isomorphism approach consists of decomposition and composition process. During the decomposition process, graphs corresponding to the design patterns are decomposed into sub-graphs, some of which are graphs corresponding to the elemental design patterns. The composition process tries to get sub-graph isomorphism of the matched graph if sub-graph isomorphism of each subgraph is obtained. Due to the common structures between design patterns, the proposed approach can reduce the matching times of entities and relations. Compared with the existing methods, the proposed algorithm is not linearly dependent on the number of design pattern graphs. Key words design pattern mining - attributed relational graph - subgraph isomorphism CLC number TP 311.5 Foundation item: Supported by the National Natural Science Foundation of China (60273075) and the Science Foundation of Naval University of Engineering (HGDJJ03019)Biography: LI Qing-hua (1940-), male, Professor, research direction: parallel computing.展开更多
Faster internet, IoT, and social media have reformed the conventional web into a collaborative web resulting in enormous user-generated content. Several studies are focused on such content;however, they mainly focus o...Faster internet, IoT, and social media have reformed the conventional web into a collaborative web resulting in enormous user-generated content. Several studies are focused on such content;however, they mainly focus on textual data, thus undermining the importance of metadata. Considering this gap, we provide a temporal pattern mining framework to model and utilize user-generated content's metadata. First, we scrap 2.1 million tweets from Twitter between Nov-2020 to Sep-2021 about 100 hashtag keywords and present these tweets into 100 User-Tweet-Hashtag (UTH) dynamic graphs. Second, we extract and identify four time-series in three timespans (Day, Hour, and Minute) from UTH dynamic graphs. Lastly, we model these four time-series with three machine learning algorithms to mine temporal patterns with the accuracy of 95.89%, 93.17%, 90.97%, and 93.73%, respectively. We demonstrate that user-generated content's metadata contains valuable information, which helps to understand the users' collective behavior and can be beneficial for business and research. Dataset and codes are publicly available;the link is given in the dataset section.展开更多
In today’s highly competitive retail industry,offline stores face increasing pressure on profitability.They hope to improve their ability in shelf management with the help of big data technology.For this,on-shelf ava...In today’s highly competitive retail industry,offline stores face increasing pressure on profitability.They hope to improve their ability in shelf management with the help of big data technology.For this,on-shelf availability is an essential indicator of shelf data management and closely relates to customer purchase behavior.RFM(recency,frequency,andmonetary)patternmining is a powerful tool to evaluate the value of customer behavior.However,the existing RFM patternmining algorithms do not consider the quarterly nature of goods,resulting in unreasonable shelf availability and difficulty in profit-making.To solve this problem,we propose a quarterly RFM mining algorithmfor On-shelf products named OS-RFM.Our algorithmmines the high recency,high frequency,and high monetary patterns and considers the period of the on-shelf goods in quarterly units.We conducted experiments using two real datasets for numerical and graphical analysis to prove the algorithm’s effectiveness.Compared with the state-of-the-art RFM mining algorithm,our algorithm can identify more patterns and performs well in terms of precision,recall,and F1-score,with the recall rate nearing 100%.Also,the novel algorithm operates with significantly shorter running times and more stable memory usage than existing mining algorithms.Additionally,we analyze the sales trends of products in different quarters and seasonal variations.The analysis assists businesses in maintaining reasonable on-shelf availability and achieving greater profitability.展开更多
The task of mining erasable patterns(EPs)is a data mining problem that can help factory managers come up with the best product plans for the future.This problem has been studied by many scientists in recent times,and ...The task of mining erasable patterns(EPs)is a data mining problem that can help factory managers come up with the best product plans for the future.This problem has been studied by many scientists in recent times,and many approaches for mining EPs have been proposed.Erasable closed patterns(ECPs)are an abbreviated representation of EPs and can be con-sidered condensed representations of EPs without information loss.Current methods of mining ECPs identify huge numbers of such patterns,whereas intelligent systems only need a small number.A ranking process therefore needs to be applied prior to use,which causes a reduction in efficiency.To overcome this limitation,this study presents a robust method for mining top-rank-k ECPs in which the mining and ranking phases are combined into a single step.First,we propose a virtual-threshold-based pruning strategy to improve the mining speed.Based on this strategy and dPidset structure,we then develop a fast algorithm for mining top-rank-k ECPs,which we call TRK-ECP.Finally,we carry out experiments to compare the runtime of our TRK-ECP algorithm with two algorithms modified from dVM and TEPUS(Top-rank-k Erasable Pattern mining Using the Subsume concept),which are state-of-the-art algorithms for mining top-rank-k EPs.The results for the running time confirm that TRK-ECP outperforms the other experimental approaches in terms of mining the top-rank-k ECPs.展开更多
Sequential pattern mining is an important data mining problem with broadapplications. However, it is also a challenging problem since the mining may have to generate orexamine a combinatorially explosive number of int...Sequential pattern mining is an important data mining problem with broadapplications. However, it is also a challenging problem since the mining may have to generate orexamine a combinatorially explosive number of intermediate subsequences. Recent studies havedeveloped two major classes of sequential pattern mining methods: (1) a candidategeneration-and-test approach, represented by (ⅰ) GSP, a horizontal format-based sequential patternmining method, and (ⅱ) SPADE, a vertical format-based method; and (2) a pattern-growth method,represented by PrefixSpan and its further extensions, such as gSpan for mining structured patterns.In this study, we perform a systematic introduction and presentation of the pattern-growthmethodology and study its principles and extensions. We first introduce two interestingpattern-growth algorithms, FreeSpan and PrefixSpan, for efficient sequential pattern mining. Then weintroduce gSpan for mining structured patterns using the same methodology. Their relativeperformance in large databases is presented and analyzed. Several extensions of these methods arealso discussed in the paper, including mining multi-level, multi-dimensional patterns and miningconstraint-based patterns.展开更多
Design patterns are often used in the development of object-oriented software. It offers reusable abstract information that is helpful in solving recurring design problems. Detecting design patterns is beneficial to t...Design patterns are often used in the development of object-oriented software. It offers reusable abstract information that is helpful in solving recurring design problems. Detecting design patterns is beneficial to the comprehension and maintenance of object-oriented software systems. Several pattern detection techniques based on static analysis often encounter problems when detecting design patterns for identical structures of patterns. In this study, we attempt to detect software design patterns by using software metrics and classification-based techniques. Our study is conducted in two phases: creation of metrics-oriented dataset and detection of software design patterns. The datasets are prepared by using software metrics for the learning of classifiers. Then, pattern detection is performed by using classification-based techniques. To evaluate the proposed method, experiments are conducted using three open source software programs, JHotDraw, QuickUML, and JUnit, and the results are analyzed.展开更多
Data mining is a powerful emerging technology that helps to extract hidden information from a huge volume of historical data. This paper is concerned with finding the frequent trajectories of moving objects in spatio-...Data mining is a powerful emerging technology that helps to extract hidden information from a huge volume of historical data. This paper is concerned with finding the frequent trajectories of moving objects in spatio-temporal data by a novel method adopting the concepts of clustering and sequential pattern mining. The algorithms used logically split the trajectory span area into clusters and then apply the k-means algorithm over this clusters until the squared error minimizes. The new method applies the threshold to obtain active clusters and arranges them in descending order based on number of trajectories passing through. From these active clusters, inter cluster patterns are found by a sequential pattern mining technique. The process is repeated until all the active clusters are linked. The clusters thus linked in sequence are the frequent trajectories. A set of experiments conducted using real datasets shows that the proposed method is relatively five times better than the existing ones. A comparison is made with the results of other algorithms and their variation is analyzed by statistical methods. Further, tests of significance are conducted with ANOVA to find the efficient threshold value for the optimum plot of frequent trajectories. The results are analyzed and found to be superior than the existing ones. This approach may be of relevance in finding alternate paths in busy networks ( congestion control), finding the frequent paths of migratory birds, or even to predict the next level of pattern characteristics in case of time series data with minor alterations and finding the frequent path of balls in certain games.展开更多
Holistic understanding of wind behaviour over space,time and height is essential for harvesting wind energy application.This study presents a novel approach for mapping frequent wind profile patterns using multidimen...Holistic understanding of wind behaviour over space,time and height is essential for harvesting wind energy application.This study presents a novel approach for mapping frequent wind profile patterns using multidimensional sequential pattern mining(MDSPM).This study is illustrated with a time series of 24 years of European Centre for Medium-Range Weather Forecasts European Reanalysis-Interim gridded(0.125°×0.125°)wind data for the Netherlands every 6 h and at six height levels.The wind data were first transformed into two spatio-temporal sequence databases(for speed and direction,respectively).Then,the Linear time Closed Itemset Miner Sequence algorithm was used to extract the multidimensional sequential patterns,which were then visualized using a 3D wind rose,a circular histogram and a geographical map.These patterns were further analysed to determine their wind shear coefficients and turbulence intensities as well as their spatial overlap with current areas with wind turbines.Our analysis identified four frequent wind profile patterns.One of them highly suitable to harvest wind energy at a height of 128 m and 68.97%of the geographical area covered by this pattern already contains wind turbines.This study shows that the proposed approach is capable of efficiently extracting meaningful patterns from complex spatio-temporal datasets.展开更多
The discovery of gradual moving object clusters pattern from trajectory streams allows characterizing movement behavior in real time environment,which leverages new applications and services.Since the trajectory strea...The discovery of gradual moving object clusters pattern from trajectory streams allows characterizing movement behavior in real time environment,which leverages new applications and services.Since the trajectory streams is rapidly evolving,continuously created and cannot be stored indefinitely in memory,the existing approaches designed on static trajectory datasets are not suitable for discovering gradual moving object clusters pattern from trajectory streams.This paper proposes a novel algorithm of gradual moving object clusters pattern discovery from trajectory streams using sliding window models.By processing the trajectory data in current window,the mining algorithm can capture the trend and evolution of moving object clusters pattern.Firstly,the density peaks clustering algorithm is exploited to identify clusters of different snapshots.The stable relationship between relatively few moving objects is used to improve the clustering efficiency.Then,by intersecting clusters from different snapshots,the gradual moving object clusters pattern is updated.The relationship of clusters between adjacent snapshots and the gradual property are utilized to accelerate updating process.Finally,experiment results on two real datasets demonstrate that our algorithm is effective and efficient.展开更多
A frequent trajectory patterns mining algorithm is proposed to learn the object activities and classify the trajectories in intelligent visual surveillance system.The distribution patterns of the trajectories were gen...A frequent trajectory patterns mining algorithm is proposed to learn the object activities and classify the trajectories in intelligent visual surveillance system.The distribution patterns of the trajectories were generated by an Apriori based frequent patterns mining algorithm and the trajectories were classified by the frequent trajectory patterns generated.In addition,a fuzzy c-means(FCM)based learning algorithm and a mean shift based clustering procedure were used to construct the representation of trajectories.The algorithm can be further used to describe activities and identify anomalies.The experiments on two real scenes show that the algorithm is effective.展开更多
Previous weighted frequent pattern (WFP) mining algorithms are not suitable for data streams for they need multiple database scans. In this paper, we present an efficient algorithm SWFP-Miner to mine weighted freque...Previous weighted frequent pattern (WFP) mining algorithms are not suitable for data streams for they need multiple database scans. In this paper, we present an efficient algorithm SWFP-Miner to mine weighted frequent pattern over data streams. SWFP-Miner is based on sliding window and can discover important frequent pattern from the recent data. A new refined weight definition is proposed to keep the downward closure property, and two pruning strategies are presented to prune the weighted infrequent pattern. Experimental studies are performed to evaluate the effectiveness and efficiency of SWFP-Miner.展开更多
The volume of trajectory data has become tremendously huge in recent years. How to effectively and efficiently maintain and compute such trajectory data has become a challenging task. In this paper, we propose a traje...The volume of trajectory data has become tremendously huge in recent years. How to effectively and efficiently maintain and compute such trajectory data has become a challenging task. In this paper, we propose a trajectory spatial and temporal compression framework, namely CLEAN. The key of spatial compression is to mine meaningful trajectory frequent patterns on road network. By treating the mined patterns as dictionary items, the long trajectories have the chance to be encoded by shorter paths, thus leading to smaller space cost. And an error-bounded temporal compression is carefully designed on top of the identified spatial patterns for much low space cost. Meanwhile, the patterns are also utilized to improve the performance of two trajectory applications, range query and clustering, without decompression overhead. Extensive experiments on real trajectory datasets validate that CLEAN significantly outperforms existing state-of-art approaches in terms of spatial-temporal compression and trajectory applications.展开更多
Despite advances in technological complexity and efforts,software repository maintenance requires reusing the data to reduce the effort and complexity.However,increasing ambiguity,irrelevance,and bugs while extracting...Despite advances in technological complexity and efforts,software repository maintenance requires reusing the data to reduce the effort and complexity.However,increasing ambiguity,irrelevance,and bugs while extracting similar data during software development generate a large amount of data from those data that reside in repositories.Thus,there is a need for a repository mining technique for relevant and bug-free data prediction.This paper proposes a fault prediction approach using a data-mining technique to find good predictors for high-quality software.To predict errors in mining data,the Apriori algorithm was used to discover association rules by fixing confidence at more than 40%and support at least 30%.The pruning strategy was adopted based on evaluation measures.Next,the rules were extracted from three projects of different domains;the extracted rules were then combined to obtain the most popular rules based on the evaluation measure values.To evaluate the proposed approach,we conducted an experimental study to compare the proposed rules with existing ones using four different industrial projects.The evaluation showed that the results of our proposal are promising.Practitioners and developers can utilize these rules for defect prediction during early software development.展开更多
In this paper, we propose an enhanced associative classification method by integrating the dynamic property in the process of associative classification. In the proposed method, we employ a support vector machine(SVM...In this paper, we propose an enhanced associative classification method by integrating the dynamic property in the process of associative classification. In the proposed method, we employ a support vector machine(SVM) based method to refine the discovered emerging ~equent patterns for classification rule extension for class label prediction. The empirical study shows that our method can be used to classify increasing resources efficiently and effectively.展开更多
Detecting cyber-attacks undoubtedly has become a big data problem. This paper presents a tutorial on data mining based cyber-attack detection. First,a data driven defence framework is presented in terms of cyber secur...Detecting cyber-attacks undoubtedly has become a big data problem. This paper presents a tutorial on data mining based cyber-attack detection. First,a data driven defence framework is presented in terms of cyber security situational awareness. Then, the process of data mining based cyber-attack detection is discussed. Next,a multi-loop learning architecture is presented for data mining based cyber-attack detection. Finally,common data mining techniques for cyber-attack detection are discussed.展开更多
基金support by the Open Project of Xiangjiang Laboratory(22XJ02003)the University Fundamental Research Fund(23-ZZCX-JDZ-28,ZK21-07)+5 种基金the National Science Fund for Outstanding Young Scholars(62122093)the National Natural Science Foundation of China(72071205)the Hunan Graduate Research Innovation Project(CX20230074)the Hunan Natural Science Foundation Regional Joint Project(2023JJ50490)the Science and Technology Project for Young and Middle-aged Talents of Hunan(2023TJZ03)the Science and Technology Innovation Program of Humnan Province(2023RC1002).
文摘Sparse large-scale multi-objective optimization problems(SLMOPs)are common in science and engineering.However,the large-scale problem represents the high dimensionality of the decision space,requiring algorithms to traverse vast expanse with limited computational resources.Furthermore,in the context of sparse,most variables in Pareto optimal solutions are zero,making it difficult for algorithms to identify non-zero variables efficiently.This paper is dedicated to addressing the challenges posed by SLMOPs.To start,we introduce innovative objective functions customized to mine maximum and minimum candidate sets.This substantial enhancement dramatically improves the efficacy of frequent pattern mining.In this way,selecting candidate sets is no longer based on the quantity of nonzero variables they contain but on a higher proportion of nonzero variables within specific dimensions.Additionally,we unveil a novel approach to association rule mining,which delves into the intricate relationships between non-zero variables.This novel methodology aids in identifying sparse distributions that can potentially expedite reductions in the objective function value.We extensively tested our algorithm across eight benchmark problems and four real-world SLMOPs.The results demonstrate that our approach achieves competitive solutions across various challenges.
基金supported by the Research on Key Technologies and Typical Applications of Big Data in Railway Production and Operation(P2023S006)the Fundamental Research Funds for the Central Universities(2022JBZY023).
文摘It is of great significance to improve the efficiency of railway production and operation by realizing the fault knowledge association through the efficient data mining algorithm.However,high utility quantitative frequent pattern mining algorithms in the field of data mining still suffer from the problems of low time-memory performance and are not easy to scale up.In the context of such needs,we propose a related degree-based frequent pattern mining algorithm,named Related High Utility Quantitative Item set Mining(RHUQI-Miner),to enable the effective mining of railway fault data.The algorithm constructs the item-related degree structure of fault data and gives a pruning optimization strategy to find frequent patterns with higher related degrees,reducing redundancy and invalid frequent patterns.Subsequently,it uses the fixed pattern length strategy to modify the utility information of the item in the mining process so that the algorithm can control the length of the output frequent pattern according to the actual data situation and further improve the performance and practicability of the algorithm.The experimental results on the real fault dataset show that RHUQI-Miner can effectively reduce the time and memory consumption in the mining process,thus providing data support for differentiated and precise maintenance strategies.
文摘A recommender system is an approach performed by e-commerce for increasing smooth users’experience.Sequential pattern mining is a technique of data mining used to identify the co-occurrence relationships by taking into account the order of transactions.This work will present the implementation of sequence pattern mining for recommender systems within the domain of e-com-merce.This work will execute the Systolic tree algorithm for mining the frequent patterns to yield feasible rules for the recommender system.The feature selec-tion's objective is to pick a feature subset having the least feature similarity as well as highest relevancy with the target class.This will mitigate the feature vector's dimensionality by eliminating redundant,irrelevant,or noisy data.This work pre-sents a new hybrid recommender system based on optimized feature selection and systolic tree.The features were extracted using Term Frequency-Inverse Docu-ment Frequency(TF-IDF),feature selection with the utilization of River Forma-tion Dynamics(RFD),and the Particle Swarm Optimization(PSO)algorithm.The systolic tree is used for pattern mining,and based on this,the recommendations are given.The proposed methods were evaluated using the MovieLens dataset,and the experimental outcomes confirmed the efficiency of the techniques.It was observed that the RFD feature selection with systolic tree frequent pattern mining with collaborativefiltering,the precision of 0.89 was achieved.
基金funded by the Enterprise Ireland Innovation Partnership Programme with Ericsson under grant agreement IP/2011/0135[6]supported by the National Natural Science Foundation of China(No.61373131,61303039,61232016,61501247)+1 种基金the PAPDCICAEET funds
文摘The rapid development of network technology and its evolution toward heterogeneous networks has increased the demand to support automatic monitoring and the management of heterogeneous wireless communication networks.This paper presents a multilevel pattern mining architecture to support automatic network management by discovering interesting patterns from telecom network monitoring data.This architecture leverages and combines existing frequent itemset discovery over data streams,association rule deduction,frequent sequential pattern mining,and frequent temporal pattern mining techniques while also making use of distributed processing platforms to achieve high-volume throughput.
文摘The identification of design pattern instances is important for program understanding and software maintenance. Aiming at the mining of design patterns in existing systems, this paper proposes a subgraph isomorphism approach to discover several design patterns in a legacy system at a time. The attributed relational graph is used to describe design patterns and legacy systems. The sub-graph isomorphism approach consists of decomposition and composition process. During the decomposition process, graphs corresponding to the design patterns are decomposed into sub-graphs, some of which are graphs corresponding to the elemental design patterns. The composition process tries to get sub-graph isomorphism of the matched graph if sub-graph isomorphism of each subgraph is obtained. Due to the common structures between design patterns, the proposed approach can reduce the matching times of entities and relations. Compared with the existing methods, the proposed algorithm is not linearly dependent on the number of design pattern graphs. Key words design pattern mining - attributed relational graph - subgraph isomorphism CLC number TP 311.5 Foundation item: Supported by the National Natural Science Foundation of China (60273075) and the Science Foundation of Naval University of Engineering (HGDJJ03019)Biography: LI Qing-hua (1940-), male, Professor, research direction: parallel computing.
基金supported by the National Natural Science Foundation of China(grant no.61573328).
文摘Faster internet, IoT, and social media have reformed the conventional web into a collaborative web resulting in enormous user-generated content. Several studies are focused on such content;however, they mainly focus on textual data, thus undermining the importance of metadata. Considering this gap, we provide a temporal pattern mining framework to model and utilize user-generated content's metadata. First, we scrap 2.1 million tweets from Twitter between Nov-2020 to Sep-2021 about 100 hashtag keywords and present these tweets into 100 User-Tweet-Hashtag (UTH) dynamic graphs. Second, we extract and identify four time-series in three timespans (Day, Hour, and Minute) from UTH dynamic graphs. Lastly, we model these four time-series with three machine learning algorithms to mine temporal patterns with the accuracy of 95.89%, 93.17%, 90.97%, and 93.73%, respectively. We demonstrate that user-generated content's metadata contains valuable information, which helps to understand the users' collective behavior and can be beneficial for business and research. Dataset and codes are publicly available;the link is given in the dataset section.
基金partially supported by the Foundation of State Key Laboratory of Public Big Data(No.PBD2022-01).
文摘In today’s highly competitive retail industry,offline stores face increasing pressure on profitability.They hope to improve their ability in shelf management with the help of big data technology.For this,on-shelf availability is an essential indicator of shelf data management and closely relates to customer purchase behavior.RFM(recency,frequency,andmonetary)patternmining is a powerful tool to evaluate the value of customer behavior.However,the existing RFM patternmining algorithms do not consider the quarterly nature of goods,resulting in unreasonable shelf availability and difficulty in profit-making.To solve this problem,we propose a quarterly RFM mining algorithmfor On-shelf products named OS-RFM.Our algorithmmines the high recency,high frequency,and high monetary patterns and considers the period of the on-shelf goods in quarterly units.We conducted experiments using two real datasets for numerical and graphical analysis to prove the algorithm’s effectiveness.Compared with the state-of-the-art RFM mining algorithm,our algorithm can identify more patterns and performs well in terms of precision,recall,and F1-score,with the recall rate nearing 100%.Also,the novel algorithm operates with significantly shorter running times and more stable memory usage than existing mining algorithms.Additionally,we analyze the sales trends of products in different quarters and seasonal variations.The analysis assists businesses in maintaining reasonable on-shelf availability and achieving greater profitability.
文摘The task of mining erasable patterns(EPs)is a data mining problem that can help factory managers come up with the best product plans for the future.This problem has been studied by many scientists in recent times,and many approaches for mining EPs have been proposed.Erasable closed patterns(ECPs)are an abbreviated representation of EPs and can be con-sidered condensed representations of EPs without information loss.Current methods of mining ECPs identify huge numbers of such patterns,whereas intelligent systems only need a small number.A ranking process therefore needs to be applied prior to use,which causes a reduction in efficiency.To overcome this limitation,this study presents a robust method for mining top-rank-k ECPs in which the mining and ranking phases are combined into a single step.First,we propose a virtual-threshold-based pruning strategy to improve the mining speed.Based on this strategy and dPidset structure,we then develop a fast algorithm for mining top-rank-k ECPs,which we call TRK-ECP.Finally,we carry out experiments to compare the runtime of our TRK-ECP algorithm with two algorithms modified from dVM and TEPUS(Top-rank-k Erasable Pattern mining Using the Subsume concept),which are state-of-the-art algorithms for mining top-rank-k EPs.The results for the running time confirm that TRK-ECP outperforms the other experimental approaches in terms of mining the top-rank-k ECPs.
文摘Sequential pattern mining is an important data mining problem with broadapplications. However, it is also a challenging problem since the mining may have to generate orexamine a combinatorially explosive number of intermediate subsequences. Recent studies havedeveloped two major classes of sequential pattern mining methods: (1) a candidategeneration-and-test approach, represented by (ⅰ) GSP, a horizontal format-based sequential patternmining method, and (ⅱ) SPADE, a vertical format-based method; and (2) a pattern-growth method,represented by PrefixSpan and its further extensions, such as gSpan for mining structured patterns.In this study, we perform a systematic introduction and presentation of the pattern-growthmethodology and study its principles and extensions. We first introduce two interestingpattern-growth algorithms, FreeSpan and PrefixSpan, for efficient sequential pattern mining. Then weintroduce gSpan for mining structured patterns using the same methodology. Their relativeperformance in large databases is presented and analyzed. Several extensions of these methods arealso discussed in the paper, including mining multi-level, multi-dimensional patterns and miningconstraint-based patterns.
文摘Design patterns are often used in the development of object-oriented software. It offers reusable abstract information that is helpful in solving recurring design problems. Detecting design patterns is beneficial to the comprehension and maintenance of object-oriented software systems. Several pattern detection techniques based on static analysis often encounter problems when detecting design patterns for identical structures of patterns. In this study, we attempt to detect software design patterns by using software metrics and classification-based techniques. Our study is conducted in two phases: creation of metrics-oriented dataset and detection of software design patterns. The datasets are prepared by using software metrics for the learning of classifiers. Then, pattern detection is performed by using classification-based techniques. To evaluate the proposed method, experiments are conducted using three open source software programs, JHotDraw, QuickUML, and JUnit, and the results are analyzed.
基金the receipt of research supported by the TATA Consultancy Service's scholarship
文摘Data mining is a powerful emerging technology that helps to extract hidden information from a huge volume of historical data. This paper is concerned with finding the frequent trajectories of moving objects in spatio-temporal data by a novel method adopting the concepts of clustering and sequential pattern mining. The algorithms used logically split the trajectory span area into clusters and then apply the k-means algorithm over this clusters until the squared error minimizes. The new method applies the threshold to obtain active clusters and arranges them in descending order based on number of trajectories passing through. From these active clusters, inter cluster patterns are found by a sequential pattern mining technique. The process is repeated until all the active clusters are linked. The clusters thus linked in sequence are the frequent trajectories. A set of experiments conducted using real datasets shows that the proposed method is relatively five times better than the existing ones. A comparison is made with the results of other algorithms and their variation is analyzed by statistical methods. Further, tests of significance are conducted with ANOVA to find the efficient threshold value for the optimum plot of frequent trajectories. The results are analyzed and found to be superior than the existing ones. This approach may be of relevance in finding alternate paths in busy networks ( congestion control), finding the frequent paths of migratory birds, or even to predict the next level of pattern characteristics in case of time series data with minor alterations and finding the frequent path of balls in certain games.
基金This work was supported by the Malaysian Ministry of Education(SLAI)and Universiti Teknologi Malaysia(UTM).
文摘Holistic understanding of wind behaviour over space,time and height is essential for harvesting wind energy application.This study presents a novel approach for mapping frequent wind profile patterns using multidimensional sequential pattern mining(MDSPM).This study is illustrated with a time series of 24 years of European Centre for Medium-Range Weather Forecasts European Reanalysis-Interim gridded(0.125°×0.125°)wind data for the Netherlands every 6 h and at six height levels.The wind data were first transformed into two spatio-temporal sequence databases(for speed and direction,respectively).Then,the Linear time Closed Itemset Miner Sequence algorithm was used to extract the multidimensional sequential patterns,which were then visualized using a 3D wind rose,a circular histogram and a geographical map.These patterns were further analysed to determine their wind shear coefficients and turbulence intensities as well as their spatial overlap with current areas with wind turbines.Our analysis identified four frequent wind profile patterns.One of them highly suitable to harvest wind energy at a height of 128 m and 68.97%of the geographical area covered by this pattern already contains wind turbines.This study shows that the proposed approach is capable of efficiently extracting meaningful patterns from complex spatio-temporal datasets.
基金This work is supported by the National Natural Science Foundationof China under Grants No. 41471371.
文摘The discovery of gradual moving object clusters pattern from trajectory streams allows characterizing movement behavior in real time environment,which leverages new applications and services.Since the trajectory streams is rapidly evolving,continuously created and cannot be stored indefinitely in memory,the existing approaches designed on static trajectory datasets are not suitable for discovering gradual moving object clusters pattern from trajectory streams.This paper proposes a novel algorithm of gradual moving object clusters pattern discovery from trajectory streams using sliding window models.By processing the trajectory data in current window,the mining algorithm can capture the trend and evolution of moving object clusters pattern.Firstly,the density peaks clustering algorithm is exploited to identify clusters of different snapshots.The stable relationship between relatively few moving objects is used to improve the clustering efficiency.Then,by intersecting clusters from different snapshots,the gradual moving object clusters pattern is updated.The relationship of clusters between adjacent snapshots and the gradual property are utilized to accelerate updating process.Finally,experiment results on two real datasets demonstrate that our algorithm is effective and efficient.
基金National High-Tech Research and Development Plan of China(No.2003AA1Z2130)Science and Technology Project of Zhejiang Province of China(No.2005C1100102)
文摘A frequent trajectory patterns mining algorithm is proposed to learn the object activities and classify the trajectories in intelligent visual surveillance system.The distribution patterns of the trajectories were generated by an Apriori based frequent patterns mining algorithm and the trajectories were classified by the frequent trajectory patterns generated.In addition,a fuzzy c-means(FCM)based learning algorithm and a mean shift based clustering procedure were used to construct the representation of trajectories.The algorithm can be further used to describe activities and identify anomalies.The experiments on two real scenes show that the algorithm is effective.
文摘Previous weighted frequent pattern (WFP) mining algorithms are not suitable for data streams for they need multiple database scans. In this paper, we present an efficient algorithm SWFP-Miner to mine weighted frequent pattern over data streams. SWFP-Miner is based on sliding window and can discover important frequent pattern from the recent data. A new refined weight definition is proposed to keep the downward closure property, and two pruning strategies are presented to prune the weighted infrequent pattern. Experimental studies are performed to evaluate the effectiveness and efficiency of SWFP-Miner.
基金National Natural Science Foundation of China (Grant No. 61772371,No. 61972286)
文摘The volume of trajectory data has become tremendously huge in recent years. How to effectively and efficiently maintain and compute such trajectory data has become a challenging task. In this paper, we propose a trajectory spatial and temporal compression framework, namely CLEAN. The key of spatial compression is to mine meaningful trajectory frequent patterns on road network. By treating the mined patterns as dictionary items, the long trajectories have the chance to be encoded by shorter paths, thus leading to smaller space cost. And an error-bounded temporal compression is carefully designed on top of the identified spatial patterns for much low space cost. Meanwhile, the patterns are also utilized to improve the performance of two trajectory applications, range query and clustering, without decompression overhead. Extensive experiments on real trajectory datasets validate that CLEAN significantly outperforms existing state-of-art approaches in terms of spatial-temporal compression and trajectory applications.
基金This research was financially supported in part by the Ministry of Trade,Industry and Energy(MOTIE)and Korea Institute for Advancement of Technology(KIAT)through the International Cooperative R&D program.(Project No.P0016038)in part by the MSIT(Ministry of Science and ICT),Korea,under the ITRC(Information Technology Research Center)support program(IITP-2021-2016-0-00312)supervised by the IITP(Institute for Information&communications Technology Planning&Evaluation).
文摘Despite advances in technological complexity and efforts,software repository maintenance requires reusing the data to reduce the effort and complexity.However,increasing ambiguity,irrelevance,and bugs while extracting similar data during software development generate a large amount of data from those data that reside in repositories.Thus,there is a need for a repository mining technique for relevant and bug-free data prediction.This paper proposes a fault prediction approach using a data-mining technique to find good predictors for high-quality software.To predict errors in mining data,the Apriori algorithm was used to discover association rules by fixing confidence at more than 40%and support at least 30%.The pruning strategy was adopted based on evaluation measures.Next,the rules were extracted from three projects of different domains;the extracted rules were then combined to obtain the most popular rules based on the evaluation measure values.To evaluate the proposed approach,we conducted an experimental study to compare the proposed rules with existing ones using four different industrial projects.The evaluation showed that the results of our proposal are promising.Practitioners and developers can utilize these rules for defect prediction during early software development.
基金Supported by the National High Technology Research and Development Program of China (No. 2007AA01Z132) the National Natural Science Foundation of China (No.60775035, 60933004, 60970088, 60903141)+1 种基金 the National Basic Research Priorities Programme (No. 2007CB311004) the National Science and Technology Support Plan (No.2006BAC08B06).
文摘In this paper, we propose an enhanced associative classification method by integrating the dynamic property in the process of associative classification. In the proposed method, we employ a support vector machine(SVM) based method to refine the discovered emerging ~equent patterns for classification rule extension for class label prediction. The empirical study shows that our method can be used to classify increasing resources efficiently and effectively.
文摘Detecting cyber-attacks undoubtedly has become a big data problem. This paper presents a tutorial on data mining based cyber-attack detection. First,a data driven defence framework is presented in terms of cyber security situational awareness. Then, the process of data mining based cyber-attack detection is discussed. Next,a multi-loop learning architecture is presented for data mining based cyber-attack detection. Finally,common data mining techniques for cyber-attack detection are discussed.