To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts. Five evaluation functions are combined to extract key phrases. Con...To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts. Five evaluation functions are combined to extract key phrases. Concept phrases, as well as the descriptions of final clusters, are presented using WordNet origin from key phrases. Initial centers and membership matrix are the most important factors affecting clustering performance. Orthogonal concept topic sub-spaces are built with the topic concept phrases representing topics of the texts and the initialization of centers and the membership matrix depend on the concept vectors in sub-spaces. The results show that, different from random initialization of traditional fuzzy c-means clustering, the initialization related to text content contributions can improve clustering precision.展开更多
The technical advancement in information systems contributes towards the massive availability of the documents stored in the electronic databases such as e-mails,internet and web pages.Therefore,it becomes a complex t...The technical advancement in information systems contributes towards the massive availability of the documents stored in the electronic databases such as e-mails,internet and web pages.Therefore,it becomes a complex task for arranging and browsing the required document.This paper proposes an approach for incremental clustering using the BatGrey Wolf Optimizer(BAGWO).The input documents are initially subjected to the pre-processing module to obtain useful keywords,and then the feature extraction is performed based on wordnet features.After feature extraction,feature selection is carried out using entropy function.Subsequently,the clustering is done using the proposed BAGWO algorithm.The BAGWO algorithm is designed by integrating the Bat Algorithm(BA)and Grey Wolf Optimizer(GWO)for generating the different clusters of text documents.Hence,the clustering is determined using the BAGWO algorithm,yielding the group of clusters.On the other side,upon the arrival of a new document,the same steps of pre-processing and feature extraction are performed.Based on the features of the test document,the mapping is done between the features of the test document,and the clusters obtained by the proposed BAGWO approach.The mapping is performed using the kernel-based deep point distance and once the mapping terminated,the representatives are updated based on the fuzzy-based representative update.The performance of the developed BAGWO outperformed the existing techniques in terms of clustering accuracy,Jaccard coefficient,and rand coefficient with maximal values 0.948,0.968,and 0.969,respectively.展开更多
基金The National Natural Science Foundation of China(No60672056)Open Fund of MOE-MS Key Laboratory of Multime-dia Computing and Communication(No06120809)
文摘To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts. Five evaluation functions are combined to extract key phrases. Concept phrases, as well as the descriptions of final clusters, are presented using WordNet origin from key phrases. Initial centers and membership matrix are the most important factors affecting clustering performance. Orthogonal concept topic sub-spaces are built with the topic concept phrases representing topics of the texts and the initialization of centers and the membership matrix depend on the concept vectors in sub-spaces. The results show that, different from random initialization of traditional fuzzy c-means clustering, the initialization related to text content contributions can improve clustering precision.
文摘The technical advancement in information systems contributes towards the massive availability of the documents stored in the electronic databases such as e-mails,internet and web pages.Therefore,it becomes a complex task for arranging and browsing the required document.This paper proposes an approach for incremental clustering using the BatGrey Wolf Optimizer(BAGWO).The input documents are initially subjected to the pre-processing module to obtain useful keywords,and then the feature extraction is performed based on wordnet features.After feature extraction,feature selection is carried out using entropy function.Subsequently,the clustering is done using the proposed BAGWO algorithm.The BAGWO algorithm is designed by integrating the Bat Algorithm(BA)and Grey Wolf Optimizer(GWO)for generating the different clusters of text documents.Hence,the clustering is determined using the BAGWO algorithm,yielding the group of clusters.On the other side,upon the arrival of a new document,the same steps of pre-processing and feature extraction are performed.Based on the features of the test document,the mapping is done between the features of the test document,and the clusters obtained by the proposed BAGWO approach.The mapping is performed using the kernel-based deep point distance and once the mapping terminated,the representatives are updated based on the fuzzy-based representative update.The performance of the developed BAGWO outperformed the existing techniques in terms of clustering accuracy,Jaccard coefficient,and rand coefficient with maximal values 0.948,0.968,and 0.969,respectively.