Anomaly detection has been an active research topic in the field of network intrusion detection for many years. A novel method is presented for anomaly detection based on system calls into the kernels of Unix or Linux...Anomaly detection has been an active research topic in the field of network intrusion detection for many years. A novel method is presented for anomaly detection based on system calls into the kernels of Unix or Linux systems. The method uses the data mining technique to model the normal behavior of a privileged program and uses a variable-length pattern matching algorithm to perform the comparison of the current behavior and historic normal behavior, which is more suitable for this problem than the fixed-length pattern matching algorithm proposed by Forrest et al. At the detection stage, the particularity of the audit data is taken into account, and two alternative schemes could be used to distinguish between normalities and intrusions. The method gives attention to both computational efficiency and detection accuracy and is especially applicable for on-line detection. The performance of the method is evaluated using the typical testing data set, and the results show that it is significantly better than the anomaly detection method based on hidden Markov models proposed by Yan et al. and the method based on fixed-length patterns proposed by Forrest and Hofmeyr. The novel method has been applied to practical hosted-based intrusion detection systems and achieved high detection performance.展开更多
In computational physics proton transfer phenomena could be viewed as pattern classification problems based on a set of input features allowing classification of the proton motion into two categories: transfer 'occu...In computational physics proton transfer phenomena could be viewed as pattern classification problems based on a set of input features allowing classification of the proton motion into two categories: transfer 'occurred' and transfer 'not occurred'. The goal of this paper is to evaluate the use of artificial neural networks in the classification of proton transfer events, based on the feed-forward back propagation neural network, used as a classifier to distinguish between the two transfer cases. In this paper, we use a new developed data mining and pattern recognition tool for automating, controlling, and drawing charts of the output data of an Empirical Valence Bond existing code. The study analyzes the need for pattern recognition in aqueous proton transfer processes and how the learning approach in error back propagation (multilayer perceptron algorithms) could be satisfactorily employed in the present case. We present a tool for pattern recognition and validate the code including a real physical case study. The results of applying the artificial neural networks methodology to crowd patterns based upon selected physical properties (e.g., temperature, density) show the abilities of the network to learn proton transfer patterns corresponding to properties of the aqueous environments, which is in turn proved to be fully compatible with previous proton transfer studies.展开更多
Data mining in the educational field can be used to optimize the teaching and learning performance among the students.The recently developed machine learning(ML)and deep learning(DL)approaches can be utilized to mine ...Data mining in the educational field can be used to optimize the teaching and learning performance among the students.The recently developed machine learning(ML)and deep learning(DL)approaches can be utilized to mine the data effectively.This study proposes an Improved Sailfish Optimizer-based Feature SelectionwithOptimal Stacked Sparse Autoencoder(ISOFS-OSSAE)for data mining and pattern recognition in the educational sector.The proposed ISOFS-OSSAE model aims to mine the educational data and derive decisions based on the feature selection and classification process.Moreover,the ISOFS-OSSAEmodel involves the design of the ISOFS technique to choose an optimal subset of features.Moreover,the swallow swarm optimization(SSO)with the SSAE model is derived to perform the classification process.To showcase the enhanced outcomes of the ISOFSOSSAE model,a wide range of experiments were taken place on a benchmark dataset from the University of California Irvine(UCI)Machine Learning Repository.The simulation results pointed out the improved classification performance of the ISOFS-OSSAE model over the recent state of art approaches interms of different performance measures.展开更多
Despite advances in technological complexity and efforts,software repository maintenance requires reusing the data to reduce the effort and complexity.However,increasing ambiguity,irrelevance,and bugs while extracting...Despite advances in technological complexity and efforts,software repository maintenance requires reusing the data to reduce the effort and complexity.However,increasing ambiguity,irrelevance,and bugs while extracting similar data during software development generate a large amount of data from those data that reside in repositories.Thus,there is a need for a repository mining technique for relevant and bug-free data prediction.This paper proposes a fault prediction approach using a data-mining technique to find good predictors for high-quality software.To predict errors in mining data,the Apriori algorithm was used to discover association rules by fixing confidence at more than 40%and support at least 30%.The pruning strategy was adopted based on evaluation measures.Next,the rules were extracted from three projects of different domains;the extracted rules were then combined to obtain the most popular rules based on the evaluation measure values.To evaluate the proposed approach,we conducted an experimental study to compare the proposed rules with existing ones using four different industrial projects.The evaluation showed that the results of our proposal are promising.Practitioners and developers can utilize these rules for defect prediction during early software development.展开更多
A new method was worked out to improve the precision of springback prediction in sheet metal forming by combining the finite element method (FEM) with the data mining (DM) technique. First the genetic algorithm (GA) w...A new method was worked out to improve the precision of springback prediction in sheet metal forming by combining the finite element method (FEM) with the data mining (DM) technique. First the genetic algorithm (GA) was adopted for recognizing the material parameters. Then according to the even design idea, the suitable calculation scheme was confirmed, and FEM was used for calculating the springback. The computation results were compared with experiment data, the difference between them was taken as source data, and a new pattern recognition method of DM called hierarchical optimal map recognition method (HOMR) is applied for summarizing the calculation regulation in FEM. At the end, the mathematics model of the springback simulation was established. Based on the model, the calculation errors of springback can be controlled within 10% compared with the experimental results.展开更多
The outbreak of coronavirus disease 2019(COVID-2019)has drawn public attention all over the world.As a newly emerging area,single cell sequencing also exerts its power in the battle over the epidemic.In this review,th...The outbreak of coronavirus disease 2019(COVID-2019)has drawn public attention all over the world.As a newly emerging area,single cell sequencing also exerts its power in the battle over the epidemic.In this review,the up-to-date knowledge of COVID-19 and its receptor is summarized,followed by a collection of the mining of single cell transcriptome profiling data for the information in aspects of the vulnerable cell types in humans and the potential mechanisms of the disease.展开更多
Background:Professor Yan-Xin Wang has been committed to the use of traditional Chinese medicine formulas to treat insomnia from the liver for many years,and has achieved excellent clinical results.In order to better i...Background:Professor Yan-Xin Wang has been committed to the use of traditional Chinese medicine formulas to treat insomnia from the liver for many years,and has achieved excellent clinical results.In order to better inherit Yan-Xin Wang’s academic thoughts.The purpose of this study is to use clinical data to explore the clinical experience of Prof.Yan-Xin Wang in the application of Chinese medicine to treat insomnia patients from the liver,explore the compatibility and medication rules of traditional Chinese medicine,and give more clinical treatment ideas for insomnia.Methods:The general data and prescription information of insomnia patients treated with Chinese herbal medicine by Prof.Yan-Xin Wang from January 1,2021,to December 31,2021,were summarized according to the inclusion and exclusion criteria,and the data were subjected to frequency statistics and drug association rules,complex network diagram analysis and cluster analysis.Results:A total of 159 patients and prescriptions were included in the study,of which 81.1%were women and 18.9%were men,containing 128 herbs;the highest frequency of use was 91.8%for Bupleuri Radix.Six Chinese herbs were used more than 70%of the time,namely Bupleuri Radix,Scutellariae Radix,oyster shell,Glycyrrhizae Radix,Os Draconis,and Ziziphi Spinosae Semen.The top 20 herbs in terms of frequency of use were analyzed in terms of the four Qi,five flavours,and their attributions.The four Qi were mainly calm and warm,the five flavours were mainly bitter and acrid,followed by sweet,and the attributions were mainly to the liver,spleen,and heart meridians.The Chinese medicine association rules set the confidence level>80%and the support level>10%,resulting in 10 two-herb and three-herb associations with the highest confidence level,such as Os Draconis is associated with oyster shell,Platycodonis Radix is associated with Achyranthis Bidentatae Radix,Scutellariae Radix is associated with Bupleuri Radix,Os Draconis,Bupleuri Radix is associated with oyster shell,Os Draconis,Scutellariae Radix is associated with oyster shell,etc.Cluster analysis yielded 3 classes of drug formulas.The complex network diagram shows that the core prescription drugs are composed of Bupleuri Radix,Chuanxiong Rhizoma,Pseudostellariae Radix,Jujubae Fructus,Os Draconis,Coptidis Rhizoma,Scutellariae Radix,Ziziphi Spinosae Semen,Smilacis Glabrae Rhizoma,Cinnamomi Cortex,White Moutan Cortex,Atractylodis Rhizoma,Glycyrrhizae Radix,oyster shell,Rehmanniae Radix Praeparata,Tritici Levis Fructus,Pinelliae Rhizoma Praeparatum,and Cinnamomi Ramulus.Conclusion:Prof.Yan-Xin Wang believes that the main treatment for patients with insomnia is based on the liver,by tonifying the deficiency and supporting the righteousness,mutually regulating the liver and spleen,and calming the mind and nourishing the heart,while adding and subtracting appropriate herbs according to the patient’s co-morbidities,which can significantly improve the patient’s insomnia symptoms.展开更多
Detecting cyber-attacks undoubtedly has become a big data problem. This paper presents a tutorial on data mining based cyber-attack detection. First,a data driven defence framework is presented in terms of cyber secur...Detecting cyber-attacks undoubtedly has become a big data problem. This paper presents a tutorial on data mining based cyber-attack detection. First,a data driven defence framework is presented in terms of cyber security situational awareness. Then, the process of data mining based cyber-attack detection is discussed. Next,a multi-loop learning architecture is presented for data mining based cyber-attack detection. Finally,common data mining techniques for cyber-attack detection are discussed.展开更多
Background:Based on the theory of"cancer toxin"in traditional Chinese medicine(TCM),combined with data mining method,this paper discusses the prescription and medication law of macroscopic"cancer toxin&...Background:Based on the theory of"cancer toxin"in traditional Chinese medicine(TCM),combined with data mining method,this paper discusses the prescription and medication law of macroscopic"cancer toxin".Methods Sort out the relevant prescriptions of macroscopic"cancer toxin"and carry out correlation analysis.Results The results showed that the main therapeutic drugs for macroscopic"cancer toxin"were invigorating the spleen and Qi,regulating the Qi mechanism,eliminating toxin and pathogenic factors,eliminating dampness and swelling,promoting blood circulation and removing blood stasis,attacking toxin,softening and firmness,and dredging collaterals to relieve pain.Conclusion High frequency medicine embodies the core of tangible macroscopic"cancer toxin"treatment,and the new prescription composition embodies the fundamental treatment.At the same time,the mechanism of macroscopic"cancer toxin"was analyzed to pave the way for further clinical practice.展开更多
Previous weighted frequent pattern (WFP) mining algorithms are not suitable for data streams for they need multiple database scans. In this paper, we present an efficient algorithm SWFP-Miner to mine weighted freque...Previous weighted frequent pattern (WFP) mining algorithms are not suitable for data streams for they need multiple database scans. In this paper, we present an efficient algorithm SWFP-Miner to mine weighted frequent pattern over data streams. SWFP-Miner is based on sliding window and can discover important frequent pattern from the recent data. A new refined weight definition is proposed to keep the downward closure property, and two pruning strategies are presented to prune the weighted infrequent pattern. Experimental studies are performed to evaluate the effectiveness and efficiency of SWFP-Miner.展开更多
The distance-based outlier detection method detects the implied outliers by calculating the distance of the points in the dataset, but the computational complexity is particularly high when processing multidimensional...The distance-based outlier detection method detects the implied outliers by calculating the distance of the points in the dataset, but the computational complexity is particularly high when processing multidimensional datasets. In addition, the traditional outlier detection method does not consider the frequency of subsets occurrence, thus, the detected outliers do not fit the definition of outliers (i.e., rarely appearing). The pattern mining-based outlier detection approaches have solved this problem, but the importance of each pattern is not taken into account in outlier detection process, so the detected outliers cannot truly reflect some actual situation. Aimed at these problems, a two-phase minimal weighted rare pattern mining-based outlier detection approach, called MWRPM-Outlier, is proposed to effectively detect outliers on the weight data stream. In particular, a method called MWRPM is proposed in the pattern mining phase to fast mine the minimal weighted rare patterns, and then two deviation factors are defined in outlier detection phase to measure the abnormal degree of each transaction on the weight data stream. Experimental results show that the proposed MWRPM-Outlier approach has excellent performance in outlier detection and MWRPM approach outperforms in weighted rare pattern mining.展开更多
In today’s highly competitive retail industry,offline stores face increasing pressure on profitability.They hope to improve their ability in shelf management with the help of big data technology.For this,on-shelf ava...In today’s highly competitive retail industry,offline stores face increasing pressure on profitability.They hope to improve their ability in shelf management with the help of big data technology.For this,on-shelf availability is an essential indicator of shelf data management and closely relates to customer purchase behavior.RFM(recency,frequency,andmonetary)patternmining is a powerful tool to evaluate the value of customer behavior.However,the existing RFM patternmining algorithms do not consider the quarterly nature of goods,resulting in unreasonable shelf availability and difficulty in profit-making.To solve this problem,we propose a quarterly RFM mining algorithmfor On-shelf products named OS-RFM.Our algorithmmines the high recency,high frequency,and high monetary patterns and considers the period of the on-shelf goods in quarterly units.We conducted experiments using two real datasets for numerical and graphical analysis to prove the algorithm’s effectiveness.Compared with the state-of-the-art RFM mining algorithm,our algorithm can identify more patterns and performs well in terms of precision,recall,and F1-score,with the recall rate nearing 100%.Also,the novel algorithm operates with significantly shorter running times and more stable memory usage than existing mining algorithms.Additionally,we analyze the sales trends of products in different quarters and seasonal variations.The analysis assists businesses in maintaining reasonable on-shelf availability and achieving greater profitability.展开更多
In order to reduce the computational and spatial complexity in rerunning algorithm of sequential patterns query, this paper proposes sequential patterns based and projection database based algorithm for fast interacti...In order to reduce the computational and spatial complexity in rerunning algorithm of sequential patterns query, this paper proposes sequential patterns based and projection database based algorithm for fast interactive sequential patterns mining algorithm (FISP), in which the number of frequent items of the projection databases constructed by the correct mining which based on the previously mined sequences has been reduced. Furthermore, the algorithm's iterative running times are reduced greatly by using global-threshold. The results of experiments testify that FISP outperforms PrefixSpan in interactive mining展开更多
Mining frequent pattern in transaction database, time series databases, and many other kinds of databases have been studied popularly in data mining research. Most of the previous studies adopt Apriori like candidat...Mining frequent pattern in transaction database, time series databases, and many other kinds of databases have been studied popularly in data mining research. Most of the previous studies adopt Apriori like candidate set generation and test approach. However, candidate set generation is very costly. Han J. proposed a novel algorithm FP growth that could generate frequent pattern without candidate set. Based on the analysis of the algorithm FP growth, this paper proposes a concept of equivalent FP tree and proposes an improved algorithm, denoted as FP growth * , which is much faster in speed, and easy to realize. FP growth * adopts a modified structure of FP tree and header table, and only generates a header table in each recursive operation and projects the tree to the original FP tree. The two algorithms get the same frequent pattern set in the same transaction database, but the performance study on computer shows that the speed of the improved algorithm, FP growth * , is at least two times as fast as that of FP growth.展开更多
By analyzing the existing prefix-tree data structure, an improved pattern tree was introduced for processing new transactions. It firstly stored transactions in a lexicographic order tree and then restructured the tre...By analyzing the existing prefix-tree data structure, an improved pattern tree was introduced for processing new transactions. It firstly stored transactions in a lexicographic order tree and then restructured the tree by sorting each path in a frequency-descending order. While updating the improved pattern tree, there was no need to rescan the entire new database or reconstruct a new tree for incremental updating. A test was performed on synthetic dataset T1014D100K with 100 000 transactions and 870 items. Experimental results show that the smaller the minimum sup- port threshold, the faster the improved pattern tree achieves over CanTree for all datasets. As the minimum support threshold increased from 2% to 3.5%, the runtime decreased from 452.71 s to 186.26 s. Meanwhile, the runtime re- quired by CanTree decreased from 1 367.03 s to 432.19 s. When the database was updated, the execution time of im- proved pattern tree consisted of construction of original improved pattern trees and reconstruction of initial tree. The experiment results showed that the runtime was saved by about 15% compared with that of CanTree. As the number of transactions increased, the runtime of improved pattern tree was about 25% shorter than that of FP-tree. The improved pattern tree also required less memory than CanTree.展开更多
Because mining complete set of frequent patterns from dense database could be impractical, an interesting alternative has been proposed recently. Instead of mining the complete set of frequent patterns, the new model ...Because mining complete set of frequent patterns from dense database could be impractical, an interesting alternative has been proposed recently. Instead of mining the complete set of frequent patterns, the new model only finds out the maximal frequent patterns, which can generate all frequent patterns. FP-growth algorithm is one of the most efficient frequent-pattern mining methods published so far. However, because FP-tree and conditional FP-trees must be two-way traversable, a great deal memory is needed in process of mining. This paper proposes an efficient algorithm Unid_FP-Max for mining maximal frequent patterns based on unidirectional FP-tree. Because of generation method of unidirectional FP-tree and conditional unidirectional FP-trees, the algorithm reduces the space consumption to the fullest extent. With the development of two techniques: single path pruning and header table pruning which can cut down many conditional unidirectional FP-trees generated recursively in mining process, Unid_FP-Max further lowers the expense of time and space.展开更多
Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting corre...Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting correlations, frequent patterns, associations, or causal structures between items hidden in a large database. By exploiting quantum computing, we propose an efficient quantum search algorithm design to discover the maximum frequent patterns. We modified Grover’s search algorithm so that a subspace of arbitrary symmetric states is used instead of the whole search space. We presented a novel quantum oracle design that employs a quantum counter to count the maximum frequent items and a quantum comparator to check with a minimum support threshold. The proposed derived algorithm increases the rate of the correct solutions since the search is only in a subspace. Furthermore, our algorithm significantly scales and optimizes the required number of qubits in design, which directly reflected positively on the performance. Our proposed design can accommodate more transactions and items and still have a good performance with a small number of qubits.展开更多
Faster internet, IoT, and social media have reformed the conventional web into a collaborative web resulting in enormous user-generated content. Several studies are focused on such content;however, they mainly focus o...Faster internet, IoT, and social media have reformed the conventional web into a collaborative web resulting in enormous user-generated content. Several studies are focused on such content;however, they mainly focus on textual data, thus undermining the importance of metadata. Considering this gap, we provide a temporal pattern mining framework to model and utilize user-generated content's metadata. First, we scrap 2.1 million tweets from Twitter between Nov-2020 to Sep-2021 about 100 hashtag keywords and present these tweets into 100 User-Tweet-Hashtag (UTH) dynamic graphs. Second, we extract and identify four time-series in three timespans (Day, Hour, and Minute) from UTH dynamic graphs. Lastly, we model these four time-series with three machine learning algorithms to mine temporal patterns with the accuracy of 95.89%, 93.17%, 90.97%, and 93.73%, respectively. We demonstrate that user-generated content's metadata contains valuable information, which helps to understand the users' collective behavior and can be beneficial for business and research. Dataset and codes are publicly available;the link is given in the dataset section.展开更多
With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data ...With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data mining and knowledge discovery in databases. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining and knowledge discovery techniques to understand user behavior better, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a comprehensive survey on the data mining and knowledge discovery techniques developed recently, and introduce some real application systems as well. In conclusion, this article also lists some problems and challenges for further research.展开更多
基金supported by the National Grand Fundamental Research "973" Program of China (2004CB318109)the National High-Technology Research and Development Plan of China (2006AA01Z452)the National Information Security "242"Program of China (2005C39).
文摘Anomaly detection has been an active research topic in the field of network intrusion detection for many years. A novel method is presented for anomaly detection based on system calls into the kernels of Unix or Linux systems. The method uses the data mining technique to model the normal behavior of a privileged program and uses a variable-length pattern matching algorithm to perform the comparison of the current behavior and historic normal behavior, which is more suitable for this problem than the fixed-length pattern matching algorithm proposed by Forrest et al. At the detection stage, the particularity of the audit data is taken into account, and two alternative schemes could be used to distinguish between normalities and intrusions. The method gives attention to both computational efficiency and detection accuracy and is especially applicable for on-line detection. The performance of the method is evaluated using the typical testing data set, and the results show that it is significantly better than the anomaly detection method based on hidden Markov models proposed by Yan et al. and the method based on fixed-length patterns proposed by Forrest and Hofmeyr. The novel method has been applied to practical hosted-based intrusion detection systems and achieved high detection performance.
基金Dr. Steve Jones, Scientific Advisor of the Canon Foundation for Scientific Research (7200 The Quorum, Oxford Business Park, Oxford OX4 2JZ, England). Canon Foundation for Scientific Research funded the UPC 2013 tuition fees of the corresponding author during her writing this article
文摘In computational physics proton transfer phenomena could be viewed as pattern classification problems based on a set of input features allowing classification of the proton motion into two categories: transfer 'occurred' and transfer 'not occurred'. The goal of this paper is to evaluate the use of artificial neural networks in the classification of proton transfer events, based on the feed-forward back propagation neural network, used as a classifier to distinguish between the two transfer cases. In this paper, we use a new developed data mining and pattern recognition tool for automating, controlling, and drawing charts of the output data of an Empirical Valence Bond existing code. The study analyzes the need for pattern recognition in aqueous proton transfer processes and how the learning approach in error back propagation (multilayer perceptron algorithms) could be satisfactorily employed in the present case. We present a tool for pattern recognition and validate the code including a real physical case study. The results of applying the artificial neural networks methodology to crowd patterns based upon selected physical properties (e.g., temperature, density) show the abilities of the network to learn proton transfer patterns corresponding to properties of the aqueous environments, which is in turn proved to be fully compatible with previous proton transfer studies.
文摘Data mining in the educational field can be used to optimize the teaching and learning performance among the students.The recently developed machine learning(ML)and deep learning(DL)approaches can be utilized to mine the data effectively.This study proposes an Improved Sailfish Optimizer-based Feature SelectionwithOptimal Stacked Sparse Autoencoder(ISOFS-OSSAE)for data mining and pattern recognition in the educational sector.The proposed ISOFS-OSSAE model aims to mine the educational data and derive decisions based on the feature selection and classification process.Moreover,the ISOFS-OSSAEmodel involves the design of the ISOFS technique to choose an optimal subset of features.Moreover,the swallow swarm optimization(SSO)with the SSAE model is derived to perform the classification process.To showcase the enhanced outcomes of the ISOFSOSSAE model,a wide range of experiments were taken place on a benchmark dataset from the University of California Irvine(UCI)Machine Learning Repository.The simulation results pointed out the improved classification performance of the ISOFS-OSSAE model over the recent state of art approaches interms of different performance measures.
基金This research was financially supported in part by the Ministry of Trade,Industry and Energy(MOTIE)and Korea Institute for Advancement of Technology(KIAT)through the International Cooperative R&D program.(Project No.P0016038)in part by the MSIT(Ministry of Science and ICT),Korea,under the ITRC(Information Technology Research Center)support program(IITP-2021-2016-0-00312)supervised by the IITP(Institute for Information&communications Technology Planning&Evaluation).
文摘Despite advances in technological complexity and efforts,software repository maintenance requires reusing the data to reduce the effort and complexity.However,increasing ambiguity,irrelevance,and bugs while extracting similar data during software development generate a large amount of data from those data that reside in repositories.Thus,there is a need for a repository mining technique for relevant and bug-free data prediction.This paper proposes a fault prediction approach using a data-mining technique to find good predictors for high-quality software.To predict errors in mining data,the Apriori algorithm was used to discover association rules by fixing confidence at more than 40%and support at least 30%.The pruning strategy was adopted based on evaluation measures.Next,the rules were extracted from three projects of different domains;the extracted rules were then combined to obtain the most popular rules based on the evaluation measure values.To evaluate the proposed approach,we conducted an experimental study to compare the proposed rules with existing ones using four different industrial projects.The evaluation showed that the results of our proposal are promising.Practitioners and developers can utilize these rules for defect prediction during early software development.
文摘A new method was worked out to improve the precision of springback prediction in sheet metal forming by combining the finite element method (FEM) with the data mining (DM) technique. First the genetic algorithm (GA) was adopted for recognizing the material parameters. Then according to the even design idea, the suitable calculation scheme was confirmed, and FEM was used for calculating the springback. The computation results were compared with experiment data, the difference between them was taken as source data, and a new pattern recognition method of DM called hierarchical optimal map recognition method (HOMR) is applied for summarizing the calculation regulation in FEM. At the end, the mathematics model of the springback simulation was established. Based on the model, the calculation errors of springback can be controlled within 10% compared with the experimental results.
基金the National Key R&D Program of China under Grant No.2018YFC0910405the National Natural Science Foundation of China under Grants No.61922020,No.61771331,and No.91935302.
文摘The outbreak of coronavirus disease 2019(COVID-2019)has drawn public attention all over the world.As a newly emerging area,single cell sequencing also exerts its power in the battle over the epidemic.In this review,the up-to-date knowledge of COVID-19 and its receptor is summarized,followed by a collection of the mining of single cell transcriptome profiling data for the information in aspects of the vulnerable cell types in humans and the potential mechanisms of the disease.
基金The Fourth National Training Program for Excellent Talents in Chinese Medicine(Clinical and Basic)(2017-24)Natural Science Research Project of Anhui University(KJ2020A0438)。
文摘Background:Professor Yan-Xin Wang has been committed to the use of traditional Chinese medicine formulas to treat insomnia from the liver for many years,and has achieved excellent clinical results.In order to better inherit Yan-Xin Wang’s academic thoughts.The purpose of this study is to use clinical data to explore the clinical experience of Prof.Yan-Xin Wang in the application of Chinese medicine to treat insomnia patients from the liver,explore the compatibility and medication rules of traditional Chinese medicine,and give more clinical treatment ideas for insomnia.Methods:The general data and prescription information of insomnia patients treated with Chinese herbal medicine by Prof.Yan-Xin Wang from January 1,2021,to December 31,2021,were summarized according to the inclusion and exclusion criteria,and the data were subjected to frequency statistics and drug association rules,complex network diagram analysis and cluster analysis.Results:A total of 159 patients and prescriptions were included in the study,of which 81.1%were women and 18.9%were men,containing 128 herbs;the highest frequency of use was 91.8%for Bupleuri Radix.Six Chinese herbs were used more than 70%of the time,namely Bupleuri Radix,Scutellariae Radix,oyster shell,Glycyrrhizae Radix,Os Draconis,and Ziziphi Spinosae Semen.The top 20 herbs in terms of frequency of use were analyzed in terms of the four Qi,five flavours,and their attributions.The four Qi were mainly calm and warm,the five flavours were mainly bitter and acrid,followed by sweet,and the attributions were mainly to the liver,spleen,and heart meridians.The Chinese medicine association rules set the confidence level>80%and the support level>10%,resulting in 10 two-herb and three-herb associations with the highest confidence level,such as Os Draconis is associated with oyster shell,Platycodonis Radix is associated with Achyranthis Bidentatae Radix,Scutellariae Radix is associated with Bupleuri Radix,Os Draconis,Bupleuri Radix is associated with oyster shell,Os Draconis,Scutellariae Radix is associated with oyster shell,etc.Cluster analysis yielded 3 classes of drug formulas.The complex network diagram shows that the core prescription drugs are composed of Bupleuri Radix,Chuanxiong Rhizoma,Pseudostellariae Radix,Jujubae Fructus,Os Draconis,Coptidis Rhizoma,Scutellariae Radix,Ziziphi Spinosae Semen,Smilacis Glabrae Rhizoma,Cinnamomi Cortex,White Moutan Cortex,Atractylodis Rhizoma,Glycyrrhizae Radix,oyster shell,Rehmanniae Radix Praeparata,Tritici Levis Fructus,Pinelliae Rhizoma Praeparatum,and Cinnamomi Ramulus.Conclusion:Prof.Yan-Xin Wang believes that the main treatment for patients with insomnia is based on the liver,by tonifying the deficiency and supporting the righteousness,mutually regulating the liver and spleen,and calming the mind and nourishing the heart,while adding and subtracting appropriate herbs according to the patient’s co-morbidities,which can significantly improve the patient’s insomnia symptoms.
文摘Detecting cyber-attacks undoubtedly has become a big data problem. This paper presents a tutorial on data mining based cyber-attack detection. First,a data driven defence framework is presented in terms of cyber security situational awareness. Then, the process of data mining based cyber-attack detection is discussed. Next,a multi-loop learning architecture is presented for data mining based cyber-attack detection. Finally,common data mining techniques for cyber-attack detection are discussed.
文摘Background:Based on the theory of"cancer toxin"in traditional Chinese medicine(TCM),combined with data mining method,this paper discusses the prescription and medication law of macroscopic"cancer toxin".Methods Sort out the relevant prescriptions of macroscopic"cancer toxin"and carry out correlation analysis.Results The results showed that the main therapeutic drugs for macroscopic"cancer toxin"were invigorating the spleen and Qi,regulating the Qi mechanism,eliminating toxin and pathogenic factors,eliminating dampness and swelling,promoting blood circulation and removing blood stasis,attacking toxin,softening and firmness,and dredging collaterals to relieve pain.Conclusion High frequency medicine embodies the core of tangible macroscopic"cancer toxin"treatment,and the new prescription composition embodies the fundamental treatment.At the same time,the mechanism of macroscopic"cancer toxin"was analyzed to pave the way for further clinical practice.
文摘Previous weighted frequent pattern (WFP) mining algorithms are not suitable for data streams for they need multiple database scans. In this paper, we present an efficient algorithm SWFP-Miner to mine weighted frequent pattern over data streams. SWFP-Miner is based on sliding window and can discover important frequent pattern from the recent data. A new refined weight definition is proposed to keep the downward closure property, and two pruning strategies are presented to prune the weighted infrequent pattern. Experimental studies are performed to evaluate the effectiveness and efficiency of SWFP-Miner.
基金supported by Fundamental Research Funds for the Central Universities (No. 2018XD004)
文摘The distance-based outlier detection method detects the implied outliers by calculating the distance of the points in the dataset, but the computational complexity is particularly high when processing multidimensional datasets. In addition, the traditional outlier detection method does not consider the frequency of subsets occurrence, thus, the detected outliers do not fit the definition of outliers (i.e., rarely appearing). The pattern mining-based outlier detection approaches have solved this problem, but the importance of each pattern is not taken into account in outlier detection process, so the detected outliers cannot truly reflect some actual situation. Aimed at these problems, a two-phase minimal weighted rare pattern mining-based outlier detection approach, called MWRPM-Outlier, is proposed to effectively detect outliers on the weight data stream. In particular, a method called MWRPM is proposed in the pattern mining phase to fast mine the minimal weighted rare patterns, and then two deviation factors are defined in outlier detection phase to measure the abnormal degree of each transaction on the weight data stream. Experimental results show that the proposed MWRPM-Outlier approach has excellent performance in outlier detection and MWRPM approach outperforms in weighted rare pattern mining.
基金partially supported by the Foundation of State Key Laboratory of Public Big Data(No.PBD2022-01).
文摘In today’s highly competitive retail industry,offline stores face increasing pressure on profitability.They hope to improve their ability in shelf management with the help of big data technology.For this,on-shelf availability is an essential indicator of shelf data management and closely relates to customer purchase behavior.RFM(recency,frequency,andmonetary)patternmining is a powerful tool to evaluate the value of customer behavior.However,the existing RFM patternmining algorithms do not consider the quarterly nature of goods,resulting in unreasonable shelf availability and difficulty in profit-making.To solve this problem,we propose a quarterly RFM mining algorithmfor On-shelf products named OS-RFM.Our algorithmmines the high recency,high frequency,and high monetary patterns and considers the period of the on-shelf goods in quarterly units.We conducted experiments using two real datasets for numerical and graphical analysis to prove the algorithm’s effectiveness.Compared with the state-of-the-art RFM mining algorithm,our algorithm can identify more patterns and performs well in terms of precision,recall,and F1-score,with the recall rate nearing 100%.Also,the novel algorithm operates with significantly shorter running times and more stable memory usage than existing mining algorithms.Additionally,we analyze the sales trends of products in different quarters and seasonal variations.The analysis assists businesses in maintaining reasonable on-shelf availability and achieving greater profitability.
基金Supported by the National Natural Science Funda-tion of China (70371015) andthe Natural Science Foundation of Jian-gsu Province (BK2004058)
文摘In order to reduce the computational and spatial complexity in rerunning algorithm of sequential patterns query, this paper proposes sequential patterns based and projection database based algorithm for fast interactive sequential patterns mining algorithm (FISP), in which the number of frequent items of the projection databases constructed by the correct mining which based on the previously mined sequences has been reduced. Furthermore, the algorithm's iterative running times are reduced greatly by using global-threshold. The results of experiments testify that FISP outperforms PrefixSpan in interactive mining
基金theFundoftheNationalManagementBureauofTraditionalChineseMedicine(No .2 0 0 0 J P 5 4 )
文摘Mining frequent pattern in transaction database, time series databases, and many other kinds of databases have been studied popularly in data mining research. Most of the previous studies adopt Apriori like candidate set generation and test approach. However, candidate set generation is very costly. Han J. proposed a novel algorithm FP growth that could generate frequent pattern without candidate set. Based on the analysis of the algorithm FP growth, this paper proposes a concept of equivalent FP tree and proposes an improved algorithm, denoted as FP growth * , which is much faster in speed, and easy to realize. FP growth * adopts a modified structure of FP tree and header table, and only generates a header table in each recursive operation and projects the tree to the original FP tree. The two algorithms get the same frequent pattern set in the same transaction database, but the performance study on computer shows that the speed of the improved algorithm, FP growth * , is at least two times as fast as that of FP growth.
基金Supported by National Natural Science Foundation of China (No.50975193)Specialized Research Fund for Doctoral Program of Higher Education of China (No.20060056016)
文摘By analyzing the existing prefix-tree data structure, an improved pattern tree was introduced for processing new transactions. It firstly stored transactions in a lexicographic order tree and then restructured the tree by sorting each path in a frequency-descending order. While updating the improved pattern tree, there was no need to rescan the entire new database or reconstruct a new tree for incremental updating. A test was performed on synthetic dataset T1014D100K with 100 000 transactions and 870 items. Experimental results show that the smaller the minimum sup- port threshold, the faster the improved pattern tree achieves over CanTree for all datasets. As the minimum support threshold increased from 2% to 3.5%, the runtime decreased from 452.71 s to 186.26 s. Meanwhile, the runtime re- quired by CanTree decreased from 1 367.03 s to 432.19 s. When the database was updated, the execution time of im- proved pattern tree consisted of construction of original improved pattern trees and reconstruction of initial tree. The experiment results showed that the runtime was saved by about 15% compared with that of CanTree. As the number of transactions increased, the runtime of improved pattern tree was about 25% shorter than that of FP-tree. The improved pattern tree also required less memory than CanTree.
基金Supported by the National Natural Science Foundation of China ( No.60474022)Henan Innovation Project for University Prominent Research Talents (No.2007KYCX018)
文摘Because mining complete set of frequent patterns from dense database could be impractical, an interesting alternative has been proposed recently. Instead of mining the complete set of frequent patterns, the new model only finds out the maximal frequent patterns, which can generate all frequent patterns. FP-growth algorithm is one of the most efficient frequent-pattern mining methods published so far. However, because FP-tree and conditional FP-trees must be two-way traversable, a great deal memory is needed in process of mining. This paper proposes an efficient algorithm Unid_FP-Max for mining maximal frequent patterns based on unidirectional FP-tree. Because of generation method of unidirectional FP-tree and conditional unidirectional FP-trees, the algorithm reduces the space consumption to the fullest extent. With the development of two techniques: single path pruning and header table pruning which can cut down many conditional unidirectional FP-trees generated recursively in mining process, Unid_FP-Max further lowers the expense of time and space.
文摘Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting correlations, frequent patterns, associations, or causal structures between items hidden in a large database. By exploiting quantum computing, we propose an efficient quantum search algorithm design to discover the maximum frequent patterns. We modified Grover’s search algorithm so that a subspace of arbitrary symmetric states is used instead of the whole search space. We presented a novel quantum oracle design that employs a quantum counter to count the maximum frequent items and a quantum comparator to check with a minimum support threshold. The proposed derived algorithm increases the rate of the correct solutions since the search is only in a subspace. Furthermore, our algorithm significantly scales and optimizes the required number of qubits in design, which directly reflected positively on the performance. Our proposed design can accommodate more transactions and items and still have a good performance with a small number of qubits.
基金supported by the National Natural Science Foundation of China(grant no.61573328).
文摘Faster internet, IoT, and social media have reformed the conventional web into a collaborative web resulting in enormous user-generated content. Several studies are focused on such content;however, they mainly focus on textual data, thus undermining the importance of metadata. Considering this gap, we provide a temporal pattern mining framework to model and utilize user-generated content's metadata. First, we scrap 2.1 million tweets from Twitter between Nov-2020 to Sep-2021 about 100 hashtag keywords and present these tweets into 100 User-Tweet-Hashtag (UTH) dynamic graphs. Second, we extract and identify four time-series in three timespans (Day, Hour, and Minute) from UTH dynamic graphs. Lastly, we model these four time-series with three machine learning algorithms to mine temporal patterns with the accuracy of 95.89%, 93.17%, 90.97%, and 93.73%, respectively. We demonstrate that user-generated content's metadata contains valuable information, which helps to understand the users' collective behavior and can be beneficial for business and research. Dataset and codes are publicly available;the link is given in the dataset section.
文摘With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data mining and knowledge discovery in databases. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining and knowledge discovery techniques to understand user behavior better, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a comprehensive survey on the data mining and knowledge discovery techniques developed recently, and introduce some real application systems as well. In conclusion, this article also lists some problems and challenges for further research.