Text classification is an essential task of natural language processing. Preprocessing, which determines the representation of text features, is one of the key steps of text classification architecture. It proposed a ...Text classification is an essential task of natural language processing. Preprocessing, which determines the representation of text features, is one of the key steps of text classification architecture. It proposed a novel efficient and effective preprocessing algorithm with three methods for text classification combining the Orthogonal Matching Pursuit algorithm to perform the classification. The main idea of the novel preprocessing strategy is that it combined stopword removal and/or regular filtering with tokenization and lowercase conversion, which can effectively reduce the feature dimension and improve the text feature matrix quality. Simulation tests on the 20 newsgroups dataset show that compared with the existing state-of-the-art method, the new method reduces the number of features by 19.85%, 34.35%, 26.25% and 38.67%, improves accuracy by 7.36%, 8.8%, 5.71% and 7.73%, and increases the speed of text classification by 17.38%, 25.64%, 23.76% and 33.38% on the four data, respectively.展开更多
With the continuous expansion of software scale,software update and maintenance have become more and more important.However,frequent software code updates will make the software more likely to introduce new defects.So...With the continuous expansion of software scale,software update and maintenance have become more and more important.However,frequent software code updates will make the software more likely to introduce new defects.So how to predict the defects quickly and accurately on the software change has become an important problem for software developers.Current defect prediction methods often cannot reflect the feature information of the defect comprehensively,and the detection effect is not ideal enough.Therefore,we propose a novel defect prediction model named ITNB(Improved Transfer Naive Bayes)based on improved transfer Naive Bayesian algorithm in this paper,which mainly considers the following two aspects:(1)Considering that the edge data of the test set may affect the similarity calculation and final prediction result,we remove the edge data of the test set when calculating the data similarity between the training set and the test set;(2)Considering that each feature dimension has different effects on defect prediction,we construct the calculation formula of training data weight based on feature dimension weight and data gravity,and then calculate the prior probability and the conditional probability of training data from the weight information,so as to construct the weighted bayesian classifier for software defect prediction.To evaluate the performance of the ITNB model,we use six datasets from large open source projects,namely Bugzilla,Columba,Mozilla,JDT,Platform and PostgreSQL.We compare the ITNB model with the transfer Naive Bayesian(TNB)model.The experimental results show that our ITNB model can achieve better results than the TNB model in terms of accurary,precision and pd for within-project and cross-project defect prediction.展开更多
The precision of the kernel independent component analysis( KICA) algorithm depends on the type and parameter values of kernel function. Therefore,it's of great significance to study the choice method of KICA'...The precision of the kernel independent component analysis( KICA) algorithm depends on the type and parameter values of kernel function. Therefore,it's of great significance to study the choice method of KICA's kernel parameters for improving its feature dimension reduction result. In this paper, a fitness function was established by use of the ideal of Fisher discrimination function firstly. Then the global optimal solution of fitness function was searched by particle swarm optimization( PSO) algorithm and a multi-state information dimension reduction algorithm based on PSO-KICA was established. Finally,the validity of this algorithm to enhance the precision of feature dimension reduction has been proven.展开更多
Analysis of cellular behavior is significant for studying cell cycle and detecting anti-cancer drugs. It is a very difficult task for image processing to isolate individual cells in confocal microscopic images of non-...Analysis of cellular behavior is significant for studying cell cycle and detecting anti-cancer drugs. It is a very difficult task for image processing to isolate individual cells in confocal microscopic images of non-stained live cell cultures. Because these images do not have adequate textural variations. Manual cell segmentation requires massive labor and is a time consuming process. This paper describes an automated cell segmentation method for localizing the cells of Chinese hamster ovary cell culture. Several kinds of high-dimensional feature descriptors, K-means clustering method and Chan-Vese model-based level set are used to extract the cellular regions. The region extracted are used to classify phases in cell cycle. The segmentation results were experimentally assessed. As a result, the proposed method proved to be significant for cell isolation. In the evaluation experiments, we constructed a database of Chinese Hamster Ovary Cell’s microscopic images which includes various photographing environments under the guidance of a biologist.展开更多
In large-scale image retrieval,deep features extracted by Convolutional Neural Network(CNN)can effectively express more image information than those extracted by traditional manual methods.However,the deep feature dim...In large-scale image retrieval,deep features extracted by Convolutional Neural Network(CNN)can effectively express more image information than those extracted by traditional manual methods.However,the deep feature dimensions obtained by Deep Convolutional Neural Network(DCNN)are too high and redundant,which leads to low retrieval efficiency.We propose a novel image retrieval method,which combines deep features selection with improved DCNN and hash transform based on high-dimension features reduction to gain low-dimension deep features and realizes efficient image retrieval.Firstly,the improved network is based on the existing deep model to build a more profound and broader network by adding multiple groups of different branches.Therefore,it is named DFS-Net(Deep Feature Selection Network).The adaptive learning deep features of the Network can effectively alleviate the influence of over-fitting and improve the feature expression of image content.Secondly,the information gain rate method is used to filter the extracted deep features to reduce the feature dimension and ensure the information loss is small.The last step of the method,hash Transform,sparsifies and binarizes this representation to reduce the computation and storage pressure while maintaining the retrieval accuracy.Finally,the scheme is based on the distinguished ResNet50,InceptionV3,and MobileNetV2 models,and studied and evaluated deeply on the CIFAR10 and Caltech256 datasets.The experimental results show that the novel method can train the deep features with stronger recognition ability on limited training samples,and improve the accuracy and efficiency of image retrieval effectively.展开更多
With the development of large scale text processing, the dimension of text feature space has become larger and larger, which has added a lot of difficulties to natural language processing. How to reduce the dimension...With the development of large scale text processing, the dimension of text feature space has become larger and larger, which has added a lot of difficulties to natural language processing. How to reduce the dimension has become a practical problem in the field. Here we present two clustering methods, i.e. concept association and concept abstract, to achieve the goal. The first refers to the keyword clustering based on the co occurrence of展开更多
文摘Text classification is an essential task of natural language processing. Preprocessing, which determines the representation of text features, is one of the key steps of text classification architecture. It proposed a novel efficient and effective preprocessing algorithm with three methods for text classification combining the Orthogonal Matching Pursuit algorithm to perform the classification. The main idea of the novel preprocessing strategy is that it combined stopword removal and/or regular filtering with tokenization and lowercase conversion, which can effectively reduce the feature dimension and improve the text feature matrix quality. Simulation tests on the 20 newsgroups dataset show that compared with the existing state-of-the-art method, the new method reduces the number of features by 19.85%, 34.35%, 26.25% and 38.67%, improves accuracy by 7.36%, 8.8%, 5.71% and 7.73%, and increases the speed of text classification by 17.38%, 25.64%, 23.76% and 33.38% on the four data, respectively.
基金This work is supported in part by the National Science Foundation of China(Nos.61672392,61373038)in part by the National Key Research and Development Program of China(No.2016YFC1202204).
文摘With the continuous expansion of software scale,software update and maintenance have become more and more important.However,frequent software code updates will make the software more likely to introduce new defects.So how to predict the defects quickly and accurately on the software change has become an important problem for software developers.Current defect prediction methods often cannot reflect the feature information of the defect comprehensively,and the detection effect is not ideal enough.Therefore,we propose a novel defect prediction model named ITNB(Improved Transfer Naive Bayes)based on improved transfer Naive Bayesian algorithm in this paper,which mainly considers the following two aspects:(1)Considering that the edge data of the test set may affect the similarity calculation and final prediction result,we remove the edge data of the test set when calculating the data similarity between the training set and the test set;(2)Considering that each feature dimension has different effects on defect prediction,we construct the calculation formula of training data weight based on feature dimension weight and data gravity,and then calculate the prior probability and the conditional probability of training data from the weight information,so as to construct the weighted bayesian classifier for software defect prediction.To evaluate the performance of the ITNB model,we use six datasets from large open source projects,namely Bugzilla,Columba,Mozilla,JDT,Platform and PostgreSQL.We compare the ITNB model with the transfer Naive Bayesian(TNB)model.The experimental results show that our ITNB model can achieve better results than the TNB model in terms of accurary,precision and pd for within-project and cross-project defect prediction.
文摘The precision of the kernel independent component analysis( KICA) algorithm depends on the type and parameter values of kernel function. Therefore,it's of great significance to study the choice method of KICA's kernel parameters for improving its feature dimension reduction result. In this paper, a fitness function was established by use of the ideal of Fisher discrimination function firstly. Then the global optimal solution of fitness function was searched by particle swarm optimization( PSO) algorithm and a multi-state information dimension reduction algorithm based on PSO-KICA was established. Finally,the validity of this algorithm to enhance the precision of feature dimension reduction has been proven.
文摘Analysis of cellular behavior is significant for studying cell cycle and detecting anti-cancer drugs. It is a very difficult task for image processing to isolate individual cells in confocal microscopic images of non-stained live cell cultures. Because these images do not have adequate textural variations. Manual cell segmentation requires massive labor and is a time consuming process. This paper describes an automated cell segmentation method for localizing the cells of Chinese hamster ovary cell culture. Several kinds of high-dimensional feature descriptors, K-means clustering method and Chan-Vese model-based level set are used to extract the cellular regions. The region extracted are used to classify phases in cell cycle. The segmentation results were experimentally assessed. As a result, the proposed method proved to be significant for cell isolation. In the evaluation experiments, we constructed a database of Chinese Hamster Ovary Cell’s microscopic images which includes various photographing environments under the guidance of a biologist.
基金supported by National Natural Foundation of China(Grant No.61772561)the Key Research&Development Plan of Hunan Province(Grant No.2018NK2012)+1 种基金Graduate Education and Teaching Reform Project of Central South University of Forestry and Technology(Grant No.2018JG005)Teaching Reform Project of Central South University of Forestry and Technology(Grant No.20180682).
文摘In large-scale image retrieval,deep features extracted by Convolutional Neural Network(CNN)can effectively express more image information than those extracted by traditional manual methods.However,the deep feature dimensions obtained by Deep Convolutional Neural Network(DCNN)are too high and redundant,which leads to low retrieval efficiency.We propose a novel image retrieval method,which combines deep features selection with improved DCNN and hash transform based on high-dimension features reduction to gain low-dimension deep features and realizes efficient image retrieval.Firstly,the improved network is based on the existing deep model to build a more profound and broader network by adding multiple groups of different branches.Therefore,it is named DFS-Net(Deep Feature Selection Network).The adaptive learning deep features of the Network can effectively alleviate the influence of over-fitting and improve the feature expression of image content.Secondly,the information gain rate method is used to filter the extracted deep features to reduce the feature dimension and ensure the information loss is small.The last step of the method,hash Transform,sparsifies and binarizes this representation to reduce the computation and storage pressure while maintaining the retrieval accuracy.Finally,the scheme is based on the distinguished ResNet50,InceptionV3,and MobileNetV2 models,and studied and evaluated deeply on the CIFAR10 and Caltech256 datasets.The experimental results show that the novel method can train the deep features with stronger recognition ability on limited training samples,and improve the accuracy and efficiency of image retrieval effectively.
文摘With the development of large scale text processing, the dimension of text feature space has become larger and larger, which has added a lot of difficulties to natural language processing. How to reduce the dimension has become a practical problem in the field. Here we present two clustering methods, i.e. concept association and concept abstract, to achieve the goal. The first refers to the keyword clustering based on the co occurrence of