Turbopump condition monitoring is a significant approach to ensure the safety of liquid rocket engine (LRE).Because of lack of fault samples,a monitoring system cannot be trained on all possible condition patterns.T...Turbopump condition monitoring is a significant approach to ensure the safety of liquid rocket engine (LRE).Because of lack of fault samples,a monitoring system cannot be trained on all possible condition patterns.Thus it is important to differentiate abnormal or unknown patterns from normal pattern with novelty detection methods.One-class support vector machine (OCSVM) that has been commonly used for novelty detection cannot deal well with large scale samples.In order to model the normal pattern of the turbopump with OCSVM and so as to monitor the condition of the turbopump,a monitoring method that integrates OCSVM with incremental clustering is presented.In this method,the incremental clustering is used for sample reduction by extracting representative vectors from a large training set.The representative vectors are supposed to distribute uniformly in the object region and fulfill the region.And training OCSVM on these representative vectors yields a novelty detector.By applying this method to the analysis of the turbopump's historical test data,it shows that the incremental clustering algorithm can extract 91 representative points from more than 36 000 training vectors,and the OCSVM detector trained on these 91 representative points can recognize spikes in vibration signals caused by different abnormal events such as vane shedding,rub-impact and sensor faults.This monitoring method does not need fault samples during training as classical recognition methods.The method resolves the learning problem of large samples and is an alternative method for condition monitoring of the LRE turbopump.展开更多
A new incremental clustering framework is presented, the basis of which is the induction as inverted deduction. Induction is inherently risky because it is not truth-preserving. If the clustering is considered as an i...A new incremental clustering framework is presented, the basis of which is the induction as inverted deduction. Induction is inherently risky because it is not truth-preserving. If the clustering is considered as an induction process, the key to build a valid clustering is to minimize the risk of clustering. From the viewpoint of modal logic, the clustering can be described as Kripke frames and Kripke models which are reflexive and symmetric. Based on the theory of modal logic, its properties can be described by system B in syntax. Thus, the risk of clustering can be calculated by the deduction relation of system B and proximity induction theorem described. Since the new proposed framework imposes no additional restrictive conditions of clustering algorithm, it is therefore a universal framework. An incremental clustering algorithm can be easily constructed by this framework from any given nonincremental clustering algorithm. The experiments show that the lower the a priori risk is, the more effective this framework is. It can be demonstrated that this framework is generally valid.展开更多
Bitcoin is a cryptocurrency based on blockchain.All historical Bitcoin transactions are stored in the Bitcoin blockchain,but Bitcoin owners are generally unknown.This is the reason for Bitcoin's pseudo-anonymity,t...Bitcoin is a cryptocurrency based on blockchain.All historical Bitcoin transactions are stored in the Bitcoin blockchain,but Bitcoin owners are generally unknown.This is the reason for Bitcoin's pseudo-anonymity,therefore it is often used for illegal transactions.Bitcoin addresses are related to Bitcoin users'identities.Some Bitcoin addresses have the potential to be analyzed due to the behavior patterns of Bitcoin transactions.However,existing Bitcoin analysis methods do not consider the fusion of new blocks'data,resulting in low efficiency of Bitcoin address analysis.In order to address this problem,this paper proposes an incremental Bitcoin address cluster method to avoid re-clustering when new block data is added.Besides,a heuristic Bitcoin address clustering algorithm is developed to improve clustering accuracy for the Bitcoin Blockchain.Experimental results show that the proposed method increases Bitcoin address cluster efficiency and accuracy.展开更多
A new incremental clustering method is presented, which partitions dynamic data sets by mapping data points in high dimension space into low dimension space based on (fuzzy) cross-entropy(CE). This algorithm is di...A new incremental clustering method is presented, which partitions dynamic data sets by mapping data points in high dimension space into low dimension space based on (fuzzy) cross-entropy(CE). This algorithm is divided into two parts: initial clustering process and incremental clustering process. The former calculates fuzzy cross-entropy or cross-entropy of one point relafive to others and a hierachical method based on cross-entropy is used for clustering static data sets. Moreover, it has the lower time complexity. The latter assigns new points to the suitable cluster by calculating membership of data point to existed centers based on the cross-entropy measure. Experimental compafisons show the proposed methood has lower time complexity than common methods in the large-scale data situations cr dynamic work environments.展开更多
Weibo,also known as micro-blog,with its extremely low threshold of information release and interactive communication mode,has become the primary source and communication form of Internet hotspots.However,characterized...Weibo,also known as micro-blog,with its extremely low threshold of information release and interactive communication mode,has become the primary source and communication form of Internet hotspots.However,characterized as a kind of short text,the sparsity in semantic features,plus its colloquial and diversified expressions makes clustering analysis more difficult.In order to solve the above problems,we use the Biterm topic model(BTM)to extract features from the corpus and use vector space model(VSM)to strengthen the features to reduce the vector dimension and highlight the main features.Then,an improved Weibo feature-incorporated incremental clustering algorithm and the Weibo buzz calculation formula are proposed to describe the buzz of Weibo,and then the discovery of hotspots can be reasonably made.The experimental results show that the incremental clustering algorithm presented in this paper can effectively improve the accuracy of clustering in different dimensions.Meanwhile,the calculation formula of Weibo buzz reasonably describes the evolution process of Weibo buzz from a qualitative point of view,which can help discover the hotspots effectively.展开更多
The technical advancement in information systems contributes towards the massive availability of the documents stored in the electronic databases such as e-mails,internet and web pages.Therefore,it becomes a complex t...The technical advancement in information systems contributes towards the massive availability of the documents stored in the electronic databases such as e-mails,internet and web pages.Therefore,it becomes a complex task for arranging and browsing the required document.This paper proposes an approach for incremental clustering using the BatGrey Wolf Optimizer(BAGWO).The input documents are initially subjected to the pre-processing module to obtain useful keywords,and then the feature extraction is performed based on wordnet features.After feature extraction,feature selection is carried out using entropy function.Subsequently,the clustering is done using the proposed BAGWO algorithm.The BAGWO algorithm is designed by integrating the Bat Algorithm(BA)and Grey Wolf Optimizer(GWO)for generating the different clusters of text documents.Hence,the clustering is determined using the BAGWO algorithm,yielding the group of clusters.On the other side,upon the arrival of a new document,the same steps of pre-processing and feature extraction are performed.Based on the features of the test document,the mapping is done between the features of the test document,and the clusters obtained by the proposed BAGWO approach.The mapping is performed using the kernel-based deep point distance and once the mapping terminated,the representatives are updated based on the fuzzy-based representative update.The performance of the developed BAGWO outperformed the existing techniques in terms of clustering accuracy,Jaccard coefficient,and rand coefficient with maximal values 0.948,0.968,and 0.969,respectively.展开更多
Considering the constantly increasing of data in large databases such as wire transfer database, incremental clustering algorithms play a more and more important role in Data Mining (DM). However, Few of the traditi...Considering the constantly increasing of data in large databases such as wire transfer database, incremental clustering algorithms play a more and more important role in Data Mining (DM). However, Few of the traditional clustering algorithms can not only handle the categorical data, but also explain its output clearly. Based on the idea of dynamic clustering, an incremental conceptive clustering algorithm is proposed in this paper. Which introduces the Semantic Core Tree (SCT) to deal with large volume of categorical wire transfer data for the detecting money laundering. In addition, the rule generation algorithm is presented here to express the clustering result by the format of knowledge. When we apply this idea in financial data mining, the efficiency of searching the characters of money laundering data will be improved.展开更多
Traditional clustering algorithms generally have some problems, such as the sensitivity to initializing parameter, difficulty in finding out the optimization clustering result and the validity of clustering. In this p...Traditional clustering algorithms generally have some problems, such as the sensitivity to initializing parameter, difficulty in finding out the optimization clustering result and the validity of clustering. In this paper, a FSM and a mathematic model of a new-style clustering algorithm based on the swarm intelligence are provided. In this algorithm, the clustering main body moves in a three-dimensional space and has the abilities of memory, communication, analysis, judgment and coordinating information. Experimental results conform that this algorithm has many merits such as insensitive to the order of the data, capable of dealing with exceptional, high-dimension or complicated data. The algorithm can be used in the fields of Web mining, incremental clustering. economic analysis, oattern recognition, document classification and so on.展开更多
In this study, we address the problems encountered by incremental face clustering. Without the benefit of having observed the entire data distribution, incremental face clustering is more challenging than static datas...In this study, we address the problems encountered by incremental face clustering. Without the benefit of having observed the entire data distribution, incremental face clustering is more challenging than static dataset clustering. Conventional methods rely on the statistical information of previous clusters to improve the efficiency of incremental clustering;thus, error accumulation may occur. Therefore, this study proposes to predict the summaries of previous data directly from data distribution via supervised learning. Moreover, an efficient framework to cluster previous summaries with new data is explored. Although learning summaries from original data costs more than those from previous clusters, the entire framework consumes just a little bit more time because clustering current data and generating summaries for new data share most of the calculations. Experiments show that the proposed approach significantly outperforms the existing incremental face clustering methods, as evidenced by the improvement of average F-score from 0.644 to 0.762. Compared with state-of-the-art static face clustering methods, our method can yield comparable accuracy while consuming much less time.展开更多
Because the number of clustering cores needs to be set before implementing the K-means algorithm,this type of algorithm often fails in applications with increasing data and changing distribution characteristics.This p...Because the number of clustering cores needs to be set before implementing the K-means algorithm,this type of algorithm often fails in applications with increasing data and changing distribution characteristics.This paper proposes an evolutionary algorithm DCC,which can dynamically adjust the number of clustering cores with data change.DCC algorithm uses the Gaussian function as the activation function of each core.Each clustering core can adjust its center vector and coverage based on the response to the input data and its memory state to better fit the sample clusters in the space.The DCC algorithm model can evolve from 0.After each new sample is added,the winning dynamic core can be adjusted or split by competitive learning,so that the number of clustering cores of the algorithm always maintains a better adaptation relationship with the existing data.Furthermore,because its clustering core can split,it can subdivide the densely distributed data clusters.Finally,detailed experimental results show that the evolutionary clustering algorithm DCC based on the dynamic core method has excellent clustering performance and strong robustness.展开更多
基金supported by National Natural Science Foundation of China (Grant No. 50675219)Hu’nan Provincial Science Committee Excellent Youth Foundation of China (Grant No. 08JJ1008)
文摘Turbopump condition monitoring is a significant approach to ensure the safety of liquid rocket engine (LRE).Because of lack of fault samples,a monitoring system cannot be trained on all possible condition patterns.Thus it is important to differentiate abnormal or unknown patterns from normal pattern with novelty detection methods.One-class support vector machine (OCSVM) that has been commonly used for novelty detection cannot deal well with large scale samples.In order to model the normal pattern of the turbopump with OCSVM and so as to monitor the condition of the turbopump,a monitoring method that integrates OCSVM with incremental clustering is presented.In this method,the incremental clustering is used for sample reduction by extracting representative vectors from a large training set.The representative vectors are supposed to distribute uniformly in the object region and fulfill the region.And training OCSVM on these representative vectors yields a novelty detector.By applying this method to the analysis of the turbopump's historical test data,it shows that the incremental clustering algorithm can extract 91 representative points from more than 36 000 training vectors,and the OCSVM detector trained on these 91 representative points can recognize spikes in vibration signals caused by different abnormal events such as vane shedding,rub-impact and sensor faults.This monitoring method does not need fault samples during training as classical recognition methods.The method resolves the learning problem of large samples and is an alternative method for condition monitoring of the LRE turbopump.
基金supported by the National High-Tech Research and Development Program of China(2006AA12A106).
文摘A new incremental clustering framework is presented, the basis of which is the induction as inverted deduction. Induction is inherently risky because it is not truth-preserving. If the clustering is considered as an induction process, the key to build a valid clustering is to minimize the risk of clustering. From the viewpoint of modal logic, the clustering can be described as Kripke frames and Kripke models which are reflexive and symmetric. Based on the theory of modal logic, its properties can be described by system B in syntax. Thus, the risk of clustering can be calculated by the deduction relation of system B and proximity induction theorem described. Since the new proposed framework imposes no additional restrictive conditions of clustering algorithm, it is therefore a universal framework. An incremental clustering algorithm can be easily constructed by this framework from any given nonincremental clustering algorithm. The experiments show that the lower the a priori risk is, the more effective this framework is. It can be demonstrated that this framework is generally valid.
基金The work reported in this paper has been partially supported by the National Key Research and Development Project(2020YFB1005503)the NSFC Projects(61502209 and U1836116)+2 种基金the Leading-edge Technology Program of Jiangsu Natural Science Foundation(BK20202001)the NSFC of Jiangsu Province Project(BK20201415)the UK-Jiangsu 20-20 World Class University Initiative programme,and the Natural Science Foundation of the Jiangsu Higher Education Institutions(Grant number:22KJB520016).
文摘Bitcoin is a cryptocurrency based on blockchain.All historical Bitcoin transactions are stored in the Bitcoin blockchain,but Bitcoin owners are generally unknown.This is the reason for Bitcoin's pseudo-anonymity,therefore it is often used for illegal transactions.Bitcoin addresses are related to Bitcoin users'identities.Some Bitcoin addresses have the potential to be analyzed due to the behavior patterns of Bitcoin transactions.However,existing Bitcoin analysis methods do not consider the fusion of new blocks'data,resulting in low efficiency of Bitcoin address analysis.In order to address this problem,this paper proposes an incremental Bitcoin address cluster method to avoid re-clustering when new block data is added.Besides,a heuristic Bitcoin address clustering algorithm is developed to improve clustering accuracy for the Bitcoin Blockchain.Experimental results show that the proposed method increases Bitcoin address cluster efficiency and accuracy.
文摘A new incremental clustering method is presented, which partitions dynamic data sets by mapping data points in high dimension space into low dimension space based on (fuzzy) cross-entropy(CE). This algorithm is divided into two parts: initial clustering process and incremental clustering process. The former calculates fuzzy cross-entropy or cross-entropy of one point relafive to others and a hierachical method based on cross-entropy is used for clustering static data sets. Moreover, it has the lower time complexity. The latter assigns new points to the suitable cluster by calculating membership of data point to existed centers based on the cross-entropy measure. Experimental compafisons show the proposed methood has lower time complexity than common methods in the large-scale data situations cr dynamic work environments.
基金the Innovation Special Fund for the Postgraduates of Jiangxi Province(No.YC2016-B016)
文摘Weibo,also known as micro-blog,with its extremely low threshold of information release and interactive communication mode,has become the primary source and communication form of Internet hotspots.However,characterized as a kind of short text,the sparsity in semantic features,plus its colloquial and diversified expressions makes clustering analysis more difficult.In order to solve the above problems,we use the Biterm topic model(BTM)to extract features from the corpus and use vector space model(VSM)to strengthen the features to reduce the vector dimension and highlight the main features.Then,an improved Weibo feature-incorporated incremental clustering algorithm and the Weibo buzz calculation formula are proposed to describe the buzz of Weibo,and then the discovery of hotspots can be reasonably made.The experimental results show that the incremental clustering algorithm presented in this paper can effectively improve the accuracy of clustering in different dimensions.Meanwhile,the calculation formula of Weibo buzz reasonably describes the evolution process of Weibo buzz from a qualitative point of view,which can help discover the hotspots effectively.
文摘The technical advancement in information systems contributes towards the massive availability of the documents stored in the electronic databases such as e-mails,internet and web pages.Therefore,it becomes a complex task for arranging and browsing the required document.This paper proposes an approach for incremental clustering using the BatGrey Wolf Optimizer(BAGWO).The input documents are initially subjected to the pre-processing module to obtain useful keywords,and then the feature extraction is performed based on wordnet features.After feature extraction,feature selection is carried out using entropy function.Subsequently,the clustering is done using the proposed BAGWO algorithm.The BAGWO algorithm is designed by integrating the Bat Algorithm(BA)and Grey Wolf Optimizer(GWO)for generating the different clusters of text documents.Hence,the clustering is determined using the BAGWO algorithm,yielding the group of clusters.On the other side,upon the arrival of a new document,the same steps of pre-processing and feature extraction are performed.Based on the features of the test document,the mapping is done between the features of the test document,and the clusters obtained by the proposed BAGWO approach.The mapping is performed using the kernel-based deep point distance and once the mapping terminated,the representatives are updated based on the fuzzy-based representative update.The performance of the developed BAGWO outperformed the existing techniques in terms of clustering accuracy,Jaccard coefficient,and rand coefficient with maximal values 0.948,0.968,and 0.969,respectively.
基金Supported by the National Natural Science Foun-dation of China (60403027) the Natural Science Foundation of HubeiProvince (2005ABA258)the Opening Foundation of State KeyLaboratory of Software Engineering (SKLSE05-07)
文摘Considering the constantly increasing of data in large databases such as wire transfer database, incremental clustering algorithms play a more and more important role in Data Mining (DM). However, Few of the traditional clustering algorithms can not only handle the categorical data, but also explain its output clearly. Based on the idea of dynamic clustering, an incremental conceptive clustering algorithm is proposed in this paper. Which introduces the Semantic Core Tree (SCT) to deal with large volume of categorical wire transfer data for the detecting money laundering. In addition, the rule generation algorithm is presented here to express the clustering result by the format of knowledge. When we apply this idea in financial data mining, the efficiency of searching the characters of money laundering data will be improved.
基金Sponsored by the Scientific Research Start-up Foundation of Qingdao University of Science and Technology.
文摘Traditional clustering algorithms generally have some problems, such as the sensitivity to initializing parameter, difficulty in finding out the optimization clustering result and the validity of clustering. In this paper, a FSM and a mathematic model of a new-style clustering algorithm based on the swarm intelligence are provided. In this algorithm, the clustering main body moves in a three-dimensional space and has the abilities of memory, communication, analysis, judgment and coordinating information. Experimental results conform that this algorithm has many merits such as insensitive to the order of the data, capable of dealing with exceptional, high-dimension or complicated data. The algorithm can be used in the fields of Web mining, incremental clustering. economic analysis, oattern recognition, document classification and so on.
基金supported by the National Natural Science Foundation of China (Nos. 61701277 and 61771288)the State Key Development Program in13th Five-Year (Nos. 2016YFB0801301, 044007008, and 2016YFB1001005)supported by the National Engineering Laboratory for Intelligent Video Analysis and Application of China。
文摘In this study, we address the problems encountered by incremental face clustering. Without the benefit of having observed the entire data distribution, incremental face clustering is more challenging than static dataset clustering. Conventional methods rely on the statistical information of previous clusters to improve the efficiency of incremental clustering;thus, error accumulation may occur. Therefore, this study proposes to predict the summaries of previous data directly from data distribution via supervised learning. Moreover, an efficient framework to cluster previous summaries with new data is explored. Although learning summaries from original data costs more than those from previous clusters, the entire framework consumes just a little bit more time because clustering current data and generating summaries for new data share most of the calculations. Experiments show that the proposed approach significantly outperforms the existing incremental face clustering methods, as evidenced by the improvement of average F-score from 0.644 to 0.762. Compared with state-of-the-art static face clustering methods, our method can yield comparable accuracy while consuming much less time.
文摘Because the number of clustering cores needs to be set before implementing the K-means algorithm,this type of algorithm often fails in applications with increasing data and changing distribution characteristics.This paper proposes an evolutionary algorithm DCC,which can dynamically adjust the number of clustering cores with data change.DCC algorithm uses the Gaussian function as the activation function of each core.Each clustering core can adjust its center vector and coverage based on the response to the input data and its memory state to better fit the sample clusters in the space.The DCC algorithm model can evolve from 0.After each new sample is added,the winning dynamic core can be adjusted or split by competitive learning,so that the number of clustering cores of the algorithm always maintains a better adaptation relationship with the existing data.Furthermore,because its clustering core can split,it can subdivide the densely distributed data clusters.Finally,detailed experimental results show that the evolutionary clustering algorithm DCC based on the dynamic core method has excellent clustering performance and strong robustness.