期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
A Novel Efficient and Effective Preprocessing Algorithm for Text Classification
1
作者 Lijie Zhu Difan Luo 《Journal of Computer and Communications》 2023年第3期1-14,共14页
Text classification is an essential task of natural language processing. Preprocessing, which determines the representation of text features, is one of the key steps of text classification architecture. It proposed a ... Text classification is an essential task of natural language processing. Preprocessing, which determines the representation of text features, is one of the key steps of text classification architecture. It proposed a novel efficient and effective preprocessing algorithm with three methods for text classification combining the Orthogonal Matching Pursuit algorithm to perform the classification. The main idea of the novel preprocessing strategy is that it combined stopword removal and/or regular filtering with tokenization and lowercase conversion, which can effectively reduce the feature dimension and improve the text feature matrix quality. Simulation tests on the 20 newsgroups dataset show that compared with the existing state-of-the-art method, the new method reduces the number of features by 19.85%, 34.35%, 26.25% and 38.67%, improves accuracy by 7.36%, 8.8%, 5.71% and 7.73%, and increases the speed of text classification by 17.38%, 25.64%, 23.76% and 33.38% on the four data, respectively. 展开更多
关键词 Text Classification PREPROCESSING Feature Dimension Orthogonal Matching Pursuit
下载PDF
Within-Project and Cross-Project Software Defect Prediction Based on Improved Transfer Naive Bayes Algorithm 被引量:3
2
作者 Kun Zhu Nana Zhang +1 位作者 Shi Ying Xu Wang 《Computers, Materials & Continua》 SCIE EI 2020年第5期891-910,共20页
With the continuous expansion of software scale,software update and maintenance have become more and more important.However,frequent software code updates will make the software more likely to introduce new defects.So... With the continuous expansion of software scale,software update and maintenance have become more and more important.However,frequent software code updates will make the software more likely to introduce new defects.So how to predict the defects quickly and accurately on the software change has become an important problem for software developers.Current defect prediction methods often cannot reflect the feature information of the defect comprehensively,and the detection effect is not ideal enough.Therefore,we propose a novel defect prediction model named ITNB(Improved Transfer Naive Bayes)based on improved transfer Naive Bayesian algorithm in this paper,which mainly considers the following two aspects:(1)Considering that the edge data of the test set may affect the similarity calculation and final prediction result,we remove the edge data of the test set when calculating the data similarity between the training set and the test set;(2)Considering that each feature dimension has different effects on defect prediction,we construct the calculation formula of training data weight based on feature dimension weight and data gravity,and then calculate the prior probability and the conditional probability of training data from the weight information,so as to construct the weighted bayesian classifier for software defect prediction.To evaluate the performance of the ITNB model,we use six datasets from large open source projects,namely Bugzilla,Columba,Mozilla,JDT,Platform and PostgreSQL.We compare the ITNB model with the transfer Naive Bayesian(TNB)model.The experimental results show that our ITNB model can achieve better results than the TNB model in terms of accurary,precision and pd for within-project and cross-project defect prediction. 展开更多
关键词 Cross-project defect prediction transfer Naive Bayesian algorithm edge data similarity calculation feature dimension weight
下载PDF
Multi-state Information Dimension Reduction Based on Particle Swarm Optimization-Kernel Independent Component Analysis
3
作者 邓士杰 苏续军 +1 位作者 唐力伟 张英波 《Journal of Donghua University(English Edition)》 EI CAS 2017年第6期791-795,共5页
The precision of the kernel independent component analysis( KICA) algorithm depends on the type and parameter values of kernel function. Therefore,it's of great significance to study the choice method of KICA'... The precision of the kernel independent component analysis( KICA) algorithm depends on the type and parameter values of kernel function. Therefore,it's of great significance to study the choice method of KICA's kernel parameters for improving its feature dimension reduction result. In this paper, a fitness function was established by use of the ideal of Fisher discrimination function firstly. Then the global optimal solution of fitness function was searched by particle swarm optimization( PSO) algorithm and a multi-state information dimension reduction algorithm based on PSO-KICA was established. Finally,the validity of this algorithm to enhance the precision of feature dimension reduction has been proven. 展开更多
关键词 kernel independent component analysis(KICA) particle swarm optimization(PSO) feature dimension reduction fitness function
下载PDF
Automated Dynamic Cellular Analysis in Time-Lapse Microscopy
4
作者 Shuntaro Aotake Chamidu Atupelage +3 位作者 Zicong Zhang Kota Aoki Hiroshi Nagahashi Daisuke Kiga 《Journal of Biosciences and Medicines》 2016年第3期44-50,共7页
Analysis of cellular behavior is significant for studying cell cycle and detecting anti-cancer drugs. It is a very difficult task for image processing to isolate individual cells in confocal microscopic images of non-... Analysis of cellular behavior is significant for studying cell cycle and detecting anti-cancer drugs. It is a very difficult task for image processing to isolate individual cells in confocal microscopic images of non-stained live cell cultures. Because these images do not have adequate textural variations. Manual cell segmentation requires massive labor and is a time consuming process. This paper describes an automated cell segmentation method for localizing the cells of Chinese hamster ovary cell culture. Several kinds of high-dimensional feature descriptors, K-means clustering method and Chan-Vese model-based level set are used to extract the cellular regions. The region extracted are used to classify phases in cell cycle. The segmentation results were experimentally assessed. As a result, the proposed method proved to be significant for cell isolation. In the evaluation experiments, we constructed a database of Chinese Hamster Ovary Cell’s microscopic images which includes various photographing environments under the guidance of a biologist. 展开更多
关键词 High Dimension Feature Analysis Microscopic Cell Image Cell Division Cycle Identification Active Contour Model K-Means Clustering
下载PDF
A Novel Image Retrieval Method with Improved DCNN and Hash
5
作者 Yan Zhou Lili Pan +1 位作者 Rongyu Chen Weizhi Shao 《Journal of Information Hiding and Privacy Protection》 2020年第2期77-86,共10页
In large-scale image retrieval,deep features extracted by Convolutional Neural Network(CNN)can effectively express more image information than those extracted by traditional manual methods.However,the deep feature dim... In large-scale image retrieval,deep features extracted by Convolutional Neural Network(CNN)can effectively express more image information than those extracted by traditional manual methods.However,the deep feature dimensions obtained by Deep Convolutional Neural Network(DCNN)are too high and redundant,which leads to low retrieval efficiency.We propose a novel image retrieval method,which combines deep features selection with improved DCNN and hash transform based on high-dimension features reduction to gain low-dimension deep features and realizes efficient image retrieval.Firstly,the improved network is based on the existing deep model to build a more profound and broader network by adding multiple groups of different branches.Therefore,it is named DFS-Net(Deep Feature Selection Network).The adaptive learning deep features of the Network can effectively alleviate the influence of over-fitting and improve the feature expression of image content.Secondly,the information gain rate method is used to filter the extracted deep features to reduce the feature dimension and ensure the information loss is small.The last step of the method,hash Transform,sparsifies and binarizes this representation to reduce the computation and storage pressure while maintaining the retrieval accuracy.Finally,the scheme is based on the distinguished ResNet50,InceptionV3,and MobileNetV2 models,and studied and evaluated deeply on the CIFAR10 and Caltech256 datasets.The experimental results show that the novel method can train the deep features with stronger recognition ability on limited training samples,and improve the accuracy and efficiency of image retrieval effectively. 展开更多
关键词 Deep feature feature dimensionality reduction feature selection
下载PDF
A Comparative Study on Two Techniques of Reducing the Dimension of Text Feature Space
6
作者 Yin Zhonghang, Wang Yongcheng, Cai Wei & Diao Qian School of Electronic & Information Technology, Shanghai Jiaotong University, Shanghai 200030, P.R.China 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2002年第1期87-92,共6页
With the development of large scale text processing, the dimension of text feature space has become larger and larger, which has added a lot of difficulties to natural language processing. How to reduce the dimension... With the development of large scale text processing, the dimension of text feature space has become larger and larger, which has added a lot of difficulties to natural language processing. How to reduce the dimension has become a practical problem in the field. Here we present two clustering methods, i.e. concept association and concept abstract, to achieve the goal. The first refers to the keyword clustering based on the co occurrence of 展开更多
关键词 in the same text and the second refers to that in the same category. Then we compare the difference between them. Our experiment results show that they are efficient to reduce the dimension of text feature space. Keywords: Text data mining
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部