期刊文献+
共找到2,939篇文章
< 1 2 147 >
每页显示 20 50 100
Relational Turkish Text Classification Using Distant Supervised Entities and Relations
1
作者 Halil Ibrahim Okur Kadir Tohma Ahmet Sertbas 《Computers, Materials & Continua》 SCIE EI 2024年第5期2209-2228,共20页
Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved throu... Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved through the integration of entity-relation information obtained from the Wikidata(Wikipedia database)database and BERTbased pre-trained Named Entity Recognition(NER)models.Focusing on a significant challenge in the field of natural language processing(NLP),the research evaluates the potential of using entity and relational information to extract deeper meaning from texts.The adopted methodology encompasses a comprehensive approach that includes text preprocessing,entity detection,and the integration of relational information.Experiments conducted on text datasets in both Turkish and English assess the performance of various classification algorithms,such as Support Vector Machine,Logistic Regression,Deep Neural Network,and Convolutional Neural Network.The results indicate that the integration of entity-relation information can significantly enhance algorithmperformance in text classification tasks and offer new perspectives for information extraction and semantic analysis in NLP applications.Contributions of this work include the utilization of distant supervised entity-relation information in Turkish text classification,the development of a Turkish relational text classification approach,and the creation of a relational database.By demonstrating potential performance improvements through the integration of distant supervised entity-relation information into Turkish text classification,this research aims to support the effectiveness of text-based artificial intelligence(AI)tools.Additionally,it makes significant contributions to the development ofmultilingual text classification systems by adding deeper meaning to text content,thereby providing a valuable addition to current NLP studies and setting an important reference point for future research. 展开更多
关键词 text classification relation extraction NER distant supervision deep learning machine learning
下载PDF
Adapter Based on Pre-Trained Language Models for Classification of Medical Text
2
作者 Quan Li 《Journal of Electronic Research and Application》 2024年第3期129-134,共6页
We present an approach to classify medical text at a sentence level automatically.Given the inherent complexity of medical text classification,we employ adapters based on pre-trained language models to extract informa... We present an approach to classify medical text at a sentence level automatically.Given the inherent complexity of medical text classification,we employ adapters based on pre-trained language models to extract information from medical text,facilitating more accurate classification while minimizing the number of trainable parameters.Extensive experiments conducted on various datasets demonstrate the effectiveness of our approach. 展开更多
关键词 classification of medical text ADAPTER Pre-trained language model
下载PDF
BSTFNet:An Encrypted Malicious Traffic Classification Method Integrating Global Semantic and Spatiotemporal Features
3
作者 Hong Huang Xingxing Zhang +2 位作者 Ye Lu Ze Li Shaohua Zhou 《Computers, Materials & Continua》 SCIE EI 2024年第3期3929-3951,共23页
While encryption technology safeguards the security of network communications,malicious traffic also uses encryption protocols to obscure its malicious behavior.To address the issues of traditional machine learning me... While encryption technology safeguards the security of network communications,malicious traffic also uses encryption protocols to obscure its malicious behavior.To address the issues of traditional machine learning methods relying on expert experience and the insufficient representation capabilities of existing deep learning methods for encrypted malicious traffic,we propose an encrypted malicious traffic classification method that integrates global semantic features with local spatiotemporal features,called BERT-based Spatio-Temporal Features Network(BSTFNet).At the packet-level granularity,the model captures the global semantic features of packets through the attention mechanism of the Bidirectional Encoder Representations from Transformers(BERT)model.At the byte-level granularity,we initially employ the Bidirectional Gated Recurrent Unit(BiGRU)model to extract temporal features from bytes,followed by the utilization of the Text Convolutional Neural Network(TextCNN)model with multi-sized convolution kernels to extract local multi-receptive field spatial features.The fusion of features from both granularities serves as the ultimate multidimensional representation of malicious traffic.Our approach achieves accuracy and F1-score of 99.39%and 99.40%,respectively,on the publicly available USTC-TFC2016 dataset,and effectively reduces sample confusion within the Neris and Virut categories.The experimental results demonstrate that our method has outstanding representation and classification capabilities for encrypted malicious traffic. 展开更多
关键词 Encrypted malicious traffic classification bidirectional encoder representations from transformers text convolutional neural network bidirectional gated recurrent unit
下载PDF
Novel Machine Learning–Based Approach for Arabic Text Classification Using Stylistic and Semantic Features 被引量:1
4
作者 Fethi Fkih Mohammed Alsuhaibani +1 位作者 Delel Rhouma Ali Mustafa Qamar 《Computers, Materials & Continua》 SCIE EI 2023年第6期5871-5886,共16页
Text classification is an essential task for many applications related to the Natural Language Processing domain.It can be applied in many fields,such as Information Retrieval,Knowledge Extraction,and Knowledge modeli... Text classification is an essential task for many applications related to the Natural Language Processing domain.It can be applied in many fields,such as Information Retrieval,Knowledge Extraction,and Knowledge modeling.Even though the importance of this task,Arabic Text Classification tools still suffer from many problems and remain incapable of responding to the increasing volume of Arabic content that circulates on the web or resides in large databases.This paper introduces a novel machine learning-based approach that exclusively uses hybrid(stylistic and semantic)features.First,we clean the Arabic documents and translate them to English using translation tools.Consequently,the semantic features are automatically extracted from the translated documents using an existing database of English topics.Besides,the model automatically extracts from the textual content a set of stylistic features such as word and character frequencies and punctuation.Therefore,we obtain 3 types of features:semantic,stylistic and hybrid.Using each time,a different type of feature,we performed an in-depth comparison study of nine well-known Machine Learning models to evaluate our approach and used a standard Arabic corpus.The obtained results show that Neural Network outperforms other models and provides good performances using hybrid features(F1-score=0.88%). 展开更多
关键词 Arabic text classification machine learning stylistic features semantic features TOPICS
下载PDF
Convolutional Deep Belief Network Based Short Text Classification on Arabic Corpus
5
作者 Abdelwahed Motwakel Badriyya B.Al-onazi +5 位作者 Jaber S.Alzahrani Radwa Marzouk Amira Sayed A.Aziz Abu Sarwar Zamani Ishfaq Yaseen Amgad Atta Abdelmageed1 《Computer Systems Science & Engineering》 SCIE EI 2023年第6期3097-3113,共17页
With a population of 440 million,Arabic language users form the rapidly growing language group on the web in terms of the number of Internet users.11 million monthly Twitter users were active and posted nearly 27.4 mi... With a population of 440 million,Arabic language users form the rapidly growing language group on the web in terms of the number of Internet users.11 million monthly Twitter users were active and posted nearly 27.4 million tweets every day.In order to develop a classification system for the Arabic lan-guage there comes a need of understanding the syntactic framework of the words thereby manipulating and representing the words for making their classification effective.In this view,this article introduces a Dolphin Swarm Optimization with Convolutional Deep Belief Network for Short Text Classification(DSOCDBN-STC)model on Arabic Corpus.The presented DSOCDBN-STC model majorly aims to classify Arabic short text in social media.The presented DSOCDBN-STC model encompasses preprocessing and word2vec word embedding at the preliminary stage.Besides,the DSOCDBN-STC model involves CDBN based classification model for Arabic short text.At last,the DSO technique can be exploited for optimal modification of the hyperparameters related to the CDBN method.To establish the enhanced performance of the DSOCDBN-STC model,a wide range of simulations have been performed.The simulation results con-firmed the supremacy of the DSOCDBN-STC model over existing models with improved accuracy of 99.26%. 展开更多
关键词 Arabic text short text classification dolphin swarm optimization deep learning
下载PDF
Gate-Attention and Dual-End Enhancement Mechanism for Multi-Label Text Classification
6
作者 Jieren Cheng Xiaolong Chen +3 位作者 Wenghang Xu Shuai Hua Zhu Tang Victor S.Sheng 《Computers, Materials & Continua》 SCIE EI 2023年第11期1779-1793,共15页
In the realm of Multi-Label Text Classification(MLTC),the dual challenges of extracting rich semantic features from text and discerning inter-label relationships have spurred innovative approaches.Many studies in sema... In the realm of Multi-Label Text Classification(MLTC),the dual challenges of extracting rich semantic features from text and discerning inter-label relationships have spurred innovative approaches.Many studies in semantic feature extraction have turned to external knowledge to augment the model’s grasp of textual content,often overlooking intrinsic textual cues such as label statistical features.In contrast,these endogenous insights naturally align with the classification task.In our paper,to complement this focus on intrinsic knowledge,we introduce a novel Gate-Attention mechanism.This mechanism adeptly integrates statistical features from the text itself into the semantic fabric,enhancing the model’s capacity to understand and represent the data.Additionally,to address the intricate task of mining label correlations,we propose a Dual-end enhancement mechanism.This mechanism effectively mitigates the challenges of information loss and erroneous transmission inherent in traditional long short term memory propagation.We conducted an extensive battery of experiments on the AAPD and RCV1-2 datasets.These experiments serve the dual purpose of confirming the efficacy of both the Gate-Attention mechanism and the Dual-end enhancement mechanism.Our final model unequivocally outperforms the baseline model,attesting to its robustness.These findings emphatically underscore the imperativeness of taking into account not just external knowledge but also the inherent intricacies of textual data when crafting potent MLTC models. 展开更多
关键词 Multi-label text classification feature extraction label distribution information sequence generation
下载PDF
Automated Arabic Text Classification Using Hyperparameter Tuned Hybrid Deep Learning Model
7
作者 Badriyya B.Al-onazi Saud S.Alotaib +4 位作者 Saeed Masoud Alshahrani Najm Alotaibi Mrim M.Alnfiai Ahmed S.Salama Manar Ahmed Hamza 《Computers, Materials & Continua》 SCIE EI 2023年第3期5447-5465,共19页
The text classification process has been extensively investigated in various languages,especially English.Text classification models are vital in several Natural Language Processing(NLP)applications.The Arabic languag... The text classification process has been extensively investigated in various languages,especially English.Text classification models are vital in several Natural Language Processing(NLP)applications.The Arabic language has a lot of significance.For instance,it is the fourth mostly-used language on the internet and the sixth official language of theUnitedNations.However,there are few studies on the text classification process in Arabic.A few text classification studies have been published earlier in the Arabic language.In general,researchers face two challenges in the Arabic text classification process:low accuracy and high dimensionality of the features.In this study,an Automated Arabic Text Classification using Hyperparameter Tuned Hybrid Deep Learning(AATC-HTHDL)model is proposed.The major goal of the proposed AATC-HTHDL method is to identify different class labels for the Arabic text.The first step in the proposed model is to pre-process the input data to transform it into a useful format.The Term Frequency-Inverse Document Frequency(TF-IDF)model is applied to extract the feature vectors.Next,the Convolutional Neural Network with Recurrent Neural Network(CRNN)model is utilized to classify the Arabic text.In the final stage,the Crow Search Algorithm(CSA)is applied to fine-tune the CRNN model’s hyperparameters,showing the work’s novelty.The proposed AATCHTHDL model was experimentally validated under different parameters and the outcomes established the supremacy of the proposed AATC-HTHDL model over other approaches. 展开更多
关键词 Hybrid deep learning natural language processing arabic language text classification parameter tuning
下载PDF
A Novel Efficient and Effective Preprocessing Algorithm for Text Classification
8
作者 Lijie Zhu Difan Luo 《Journal of Computer and Communications》 2023年第3期1-14,共14页
Text classification is an essential task of natural language processing. Preprocessing, which determines the representation of text features, is one of the key steps of text classification architecture. It proposed a ... Text classification is an essential task of natural language processing. Preprocessing, which determines the representation of text features, is one of the key steps of text classification architecture. It proposed a novel efficient and effective preprocessing algorithm with three methods for text classification combining the Orthogonal Matching Pursuit algorithm to perform the classification. The main idea of the novel preprocessing strategy is that it combined stopword removal and/or regular filtering with tokenization and lowercase conversion, which can effectively reduce the feature dimension and improve the text feature matrix quality. Simulation tests on the 20 newsgroups dataset show that compared with the existing state-of-the-art method, the new method reduces the number of features by 19.85%, 34.35%, 26.25% and 38.67%, improves accuracy by 7.36%, 8.8%, 5.71% and 7.73%, and increases the speed of text classification by 17.38%, 25.64%, 23.76% and 33.38% on the four data, respectively. 展开更多
关键词 text classification PREPROCESSING Feature Dimension Orthogonal Matching Pursuit
下载PDF
Review of Text Classification Methods on Deep Learning 被引量:9
9
作者 Hongping Wu Yuling Liu Jingwen Wang 《Computers, Materials & Continua》 SCIE EI 2020年第6期1309-1321,共13页
Text classification has always been an increasingly crucial topic in natural language processing.Traditional text classification methods based on machine learning have many disadvantages such as dimension explosion,da... Text classification has always been an increasingly crucial topic in natural language processing.Traditional text classification methods based on machine learning have many disadvantages such as dimension explosion,data sparsity,limited generalization ability and so on.Based on deep learning text classification,this paper presents an extensive study on the text classification models including Convolutional Neural Network-Based(CNN-Based),Recurrent Neural Network-Based(RNN-based),Attention Mechanisms-Based and so on.Many studies have proved that text classification methods based on deep learning outperform the traditional methods when processing large-scale and complex datasets.The main reasons are text classification methods based on deep learning can avoid cumbersome feature extraction process and have higher prediction accuracy for a large set of unstructured data.In this paper,we also summarize the shortcomings of traditional text classification methods and introduce the text classification process based on deep learning including text preprocessing,distributed representation of text,text classification model construction based on deep learning and performance evaluation. 展开更多
关键词 text classification deep learning distributed representation CNN RNN attention mechanism
下载PDF
MII:A Novel Text Classification Model Combining Deep Active Learning with BERT 被引量:6
10
作者 Anman Zhang Bohan Li +2 位作者 Wenhuan Wang Shuo Wan Weitong Chen 《Computers, Materials & Continua》 SCIE EI 2020年第6期1499-1514,共16页
Active learning has been widely utilized to reduce the labeling cost of supervised learning.By selecting specific instances to train the model,the performance of the model was improved within limited steps.However,rar... Active learning has been widely utilized to reduce the labeling cost of supervised learning.By selecting specific instances to train the model,the performance of the model was improved within limited steps.However,rare work paid attention to the effectiveness of active learning on it.In this paper,we proposed a deep active learning model with bidirectional encoder representations from transformers(BERT)for text classification.BERT takes advantage of the self-attention mechanism to integrate contextual information,which is beneficial to accelerate the convergence of training.As for the process of active learning,we design an instance selection strategy based on posterior probabilities Margin,Intra-correlation and Inter-correlation(MII).Selected instances are characterized by small margin,low intra-cohesion and high inter-cohesion.We conduct extensive experiments and analytics with our methods.The effect of learner is compared while the effect of sampling strategy and text classification is assessed from three real datasets.The results show that our method outperforms the baselines in terms of accuracy. 展开更多
关键词 Active learning instance selection deep neural network text classification
下载PDF
Dimensionality Reduction by Mutual Information for Text Classification 被引量:2
11
作者 刘丽珍 宋瀚涛 陆玉昌 《Journal of Beijing Institute of Technology》 EI CAS 2005年第1期32-36,共5页
The frame of text classification system was presented. The high dimensionality in feature space for text classification was studied. The mutual information is a widely used information theoretic measure, in a descript... The frame of text classification system was presented. The high dimensionality in feature space for text classification was studied. The mutual information is a widely used information theoretic measure, in a descriptive way, to measure the stochastic dependency of discrete random variables. The measure method was used as a criterion to reduce high dimensionality of feature vectors in text classification on Web. Feature selections or conversions were performed by using maximum mutual information including linear and non-linear feature conversions. Entropy was used and extended to find right features commendably in pattern recognition systems. Favorable foundation would be established for text classification mining. 展开更多
关键词 text classification mutual information dimensionality reduction
下载PDF
基于DAN与FastText的藏文短文本分类研究
12
作者 李果 陈晨 +1 位作者 杨进 群诺 《计算机科学》 CSCD 北大核心 2024年第S01期103-107,共5页
随着藏文信息不断融入社会生活,越来越多的藏文短文本数据存在网络平台上。针对传统分类方法在藏文短文本上分类性能低的问题,文中提出了一种基于DAN-FastText的藏文短文本分类模型。该模型使用FastText网络在较大规模的藏文语料上进行... 随着藏文信息不断融入社会生活,越来越多的藏文短文本数据存在网络平台上。针对传统分类方法在藏文短文本上分类性能低的问题,文中提出了一种基于DAN-FastText的藏文短文本分类模型。该模型使用FastText网络在较大规模的藏文语料上进行无监督训练获得预训练的藏文音节向量集,使用预训练的音节向量集将藏文短文本信息转化为音节向量,把音节向量送入DAN(Deep Averaging Networks)网络并在输出阶段融合经过FastText网络训练的句向量特征,最后通过全连接层和softmax层完成分类。在公开的TNCC(Tibetan News Classification Corpus)新闻标题数据集上所提模型的Macro-F1是64.53%,比目前最好评测结果TiBERT模型的Macro-F1得分高出2.81%,比GCN模型的Macro-F1得分高出6.14%,融合模型具有较好的藏文短文本分类效果。 展开更多
关键词 藏文短文本分类 特征融合 深度平均网络 快速文本
下载PDF
Question-Answering Pair Matching Based on Question Classification and Ensemble Sentence Embedding
13
作者 Jae-Seok Jang Hyuk-Yoon Kwon 《Computer Systems Science & Engineering》 SCIE EI 2023年第9期3471-3489,共19页
Question-answering(QA)models find answers to a given question.The necessity of automatically finding answers is increasing because it is very important and challenging from the large-scale QA data sets.In this paper,w... Question-answering(QA)models find answers to a given question.The necessity of automatically finding answers is increasing because it is very important and challenging from the large-scale QA data sets.In this paper,we deal with the QA pair matching approach in QA models,which finds the most relevant question and its recommended answer for a given question.Existing studies for the approach performed on the entire dataset or datasets within a category that the question writer manually specifies.In contrast,we aim to automatically find the category to which the question belongs by employing the text classification model and to find the answer corresponding to the question within the category.Due to the text classification model,we can effectively reduce the search space for finding the answers to a given question.Therefore,the proposed model improves the accuracy of the QA matching model and significantly reduces the model inference time.Furthermore,to improve the performance of finding similar sentences in each category,we present an ensemble embedding model for sentences,improving the performance compared to the individual embedding models.Using real-world QA data sets,we evaluate the performance of the proposed QA matching model.As a result,the accuracy of our final ensemble embedding model based on the text classification model is 81.18%,which outperforms the existing models by 9.81%∼14.16%point.Moreover,in terms of the model inference speed,our model is faster than the existing models by 2.61∼5.07 times due to the effective reduction of search spaces by the text classification model. 展开更多
关键词 Question-answering text classification model data augmentation text embedding
下载PDF
Term-Based Pooling in Convolutional Neural Networks for Text Classification 被引量:2
14
作者 Shuifei Zeng Yan Ma +1 位作者 Xiaoyan Zhang Xiaofeng Du 《China Communications》 SCIE CSCD 2020年第4期109-124,共16页
To achieve good results in convolutional neural networks(CNN) for text classification task, term-based pooling operation in CNNs is proposed. Firstly, the convolution results of several convolution kernels are combine... To achieve good results in convolutional neural networks(CNN) for text classification task, term-based pooling operation in CNNs is proposed. Firstly, the convolution results of several convolution kernels are combined by this method, and then the results after combination are made pooling operation, three sorts of CNN models(we named TBCNN, MCT-CNN and MMCT-CNN respectively) are constructed and then corresponding algorithmic thought are detailed on this basis. Secondly, relevant experiments and analyses are respectively designed to show the effects of three key parameters(convolution kernel, combination kernel number and word embedding) on three kinds of CNN models and to further demonstrate the effect of the models proposed. The experimental results show that compared with the traditional method of text classification in CNNs, term-based pooling method is addressed that not only the availability of the way is proved, but also the performance shows good superiority. 展开更多
关键词 convolutional NEURAL Networks term-based pooling text classification
下载PDF
Optimal Deep Hybrid Boltzmann Machine Based Arabic Corpus Classification Model
15
作者 Mesfer Al Duhayyim Badriyya B.Al-onazi +5 位作者 Mohamed K.Nour Ayman Yafoz Amal S.Mehanna Ishfaq Yaseen Amgad Atta Abdelmageed Gouse Pasha Mohammed 《Computer Systems Science & Engineering》 SCIE EI 2023年第9期2755-2772,共18页
Natural Language Processing(NLP)for the Arabic language has gained much significance in recent years.The most commonly-utilized NLP task is the‘Text Classification’process.Its main intention is to apply the Machine ... Natural Language Processing(NLP)for the Arabic language has gained much significance in recent years.The most commonly-utilized NLP task is the‘Text Classification’process.Its main intention is to apply the Machine Learning(ML)approaches for automatically classifying the textual files into one or more pre-defined categories.In ML approaches,the first and foremost crucial step is identifying an appropriate large dataset to test and train the method.One of the trending ML techniques,i.e.,Deep Learning(DL)technique needs huge volumes of different types of datasets for training to yield the best outcomes.The current study designs a new Dice Optimization with a Deep Hybrid Boltzmann Machinebased Arabic Corpus Classification(DODHBM-ACC)model in this background.The presented DODHBM-ACC model primarily relies upon different stages of pre-processing and the word2vec word embedding process.For Arabic text classification,the DHBM technique is utilized.This technique is a hybrid version of the Deep Boltzmann Machine(DBM)and Deep Belief Network(DBN).It has the advantage of learning the decisive intention of the classification process.To adjust the hyperparameters of the DHBM technique,the Dice Optimization Algorithm(DOA)is exploited in this study.The experimental analysis was conducted to establish the superior performance of the proposed DODHBM-ACC model.The outcomes inferred the better performance of the proposed DODHBM-ACC model over other recent approaches. 展开更多
关键词 Arabic corpus text classification machine learning deep learning dice optimization
下载PDF
Ensemble Deep Learning Framework for Situational Aspects-Based Annotation and Classification of International Student’s Tweets during COVID-19
16
作者 Shabir Hussain Muhammad Ayoub +4 位作者 Yang Yu Junaid Abdul Wahid Akmal Khan Dietmar P.F.Moller Hou Weiyan 《Computers, Materials & Continua》 SCIE EI 2023年第6期5355-5377,共23页
As the COVID-19 pandemic swept the globe,social media plat-forms became an essential source of information and communication for many.International students,particularly,turned to Twitter to express their struggles an... As the COVID-19 pandemic swept the globe,social media plat-forms became an essential source of information and communication for many.International students,particularly,turned to Twitter to express their struggles and hardships during this difficult time.To better understand the sentiments and experiences of these international students,we developed the Situational Aspect-Based Annotation and Classification(SABAC)text mining framework.This framework uses a three-layer approach,combining baseline Deep Learning(DL)models with Machine Learning(ML)models as meta-classifiers to accurately predict the sentiments and aspects expressed in tweets from our collected Student-COVID-19 dataset.Using the pro-posed aspect2class annotation algorithm,we labeled bulk unlabeled tweets according to their contained aspect terms.However,we also recognized the challenges of reducing data’s high dimensionality and sparsity to improve performance and annotation on unlabeled datasets.To address this issue,we proposed the Volatile Stopwords Filtering(VSF)technique to reduce sparsity and enhance classifier performance.The resulting Student-COVID Twitter dataset achieved a sophisticated accuracy of 93.21%when using the random forest as a meta-classifier.Through testing on three benchmark datasets,we found that the SABAC ensemble framework performed exceptionally well.Our findings showed that international students during the pandemic faced various issues,including stress,uncertainty,health concerns,financial stress,and difficulties with online classes and returning to school.By analyzing and summarizing these annotated tweets,decision-makers can better understand and address the real-time problems international students face during the ongoing pandemic. 展开更多
关键词 COVID-19 pandemic situational awareness ensemble learning aspect-based text classification deep learning models international students topic modeling
下载PDF
Feature selection algorithm for text classification based on improved mutual information 被引量:1
17
作者 丛帅 张积宾 +1 位作者 徐志明 王宇颖 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2011年第3期144-148,共5页
In order to solve the poor performance in text classification when using traditional formula of mutual information (MI),a feature selection algorithm were proposed based on improved mutual information.The improved mut... In order to solve the poor performance in text classification when using traditional formula of mutual information (MI),a feature selection algorithm were proposed based on improved mutual information.The improved mutual information algorithm,which is on the basis of traditional improved mutual information methods that enhance the MI value of negative characteristics and feature's frequency,supports the concept of concentration degree and dispersion degree.In accordance with the concept of concentration degree and dispersion degree,formulas which embody concentration degree and dispersion degree were constructed and the improved mutual information was implemented based on these.In this paper,the feature selection algorithm was applied based on improved mutual information to a text classifier based on Biomimetic Pattern Recognition and it was compared with several other feature selection methods.The experimental results showed that the improved mutual information feature selection method greatly enhances the performance compared with traditional mutual information feature selection methods and the performance is better than that of information gain.Through the introduction of the concept of concentration degree and dispersion degree,the improved mutual information feature selection method greatly improves the performance of text classification system. 展开更多
关键词 text classification feature selection improved mutual information Biomimetic Pattern Recognition
下载PDF
Supervised Learning Algorithm on Unstructured Documents for the Classification of Job Offers: Case of Cameroun
18
作者 Fritz Sosso Makembe Roger Atsa Etoundi Hippolyte Tapamo 《Journal of Computer and Communications》 2023年第2期75-88,共14页
Nowadays, in data science, supervised learning algorithms are frequently used to perform text classification. However, African textual data, in general, have been studied very little using these methods. This article ... Nowadays, in data science, supervised learning algorithms are frequently used to perform text classification. However, African textual data, in general, have been studied very little using these methods. This article notes the particularity of the data and measures the level of precision of predictions of naive Bayes algorithms, decision tree, and SVM (Support Vector Machine) on a corpus of computer jobs taken on the internet. This is due to the data imbalance problem in machine learning. However, this problem essentially focuses on the distribution of the number of documents in each class or subclass. Here, we delve deeper into the problem to the word count distribution in a set of documents. The results are compared with those obtained on a set of French IT offers. It appears that the precision of the classification varies between 88% and 90% for French offers against 67%, at most, for Cameroonian offers. The contribution of this study is twofold. Indeed, it clearly shows that, in a similar job category, job offers on the internet in Cameroon are more unstructured compared to those available in France, for example. Moreover, it makes it possible to emit a strong hypothesis according to which sets of texts having a symmetrical distribution of the number of words obtain better results with supervised learning algorithms. 展开更多
关键词 Job Offer Underemployment text classification Imbalanced Data Symmetric Word Distribution Supervised Learning
下载PDF
Short Text Classification Based on Improved ITC 被引量:1
19
作者 Liangliang Li Shouning Qu 《Journal of Computer and Communications》 2013年第4期22-27,共6页
The long text classification has got great achievements, but short text classification still needs to be perfected. In this paper, at first, we describe why we select the ITC feature selection algorithm not the conven... The long text classification has got great achievements, but short text classification still needs to be perfected. In this paper, at first, we describe why we select the ITC feature selection algorithm not the conventional TFIDF and the superiority of the ITC compared with the TFIDF, then we conclude the flaws of the conventional ITC algorithm, and then we present an improved ITC feature selection algorithm based on the characteristics of short text classification while combining the concepts of the Documents Distribution Entropy with the Position Distribution Weight. The improved ITC algorithm conforms to the actual situation of the short text classification. The experimental results show that the performance based on the new algorithm was much better than that based on the traditional TFIDF and ITC. 展开更多
关键词 ITC text classification SHORT text
下载PDF
基于改进分层注意网络和TextCNN联合建模的暴力犯罪分级算法
20
作者 张家伟 高冠东 +1 位作者 肖珂 宋胜尊 《计算机应用》 CSCD 北大核心 2024年第2期403-410,共8页
为了科学、智能地对服刑人员的暴力倾向分级,将自然语言处理(NLP)中的文本分类方法引入犯罪心理学领域,提出一种基于改进分层注意网络(HAN)与TextCNN(Text Convolutional Neural Network)两通道联合建模的犯罪语义卷积分层注意网络(CCHA... 为了科学、智能地对服刑人员的暴力倾向分级,将自然语言处理(NLP)中的文本分类方法引入犯罪心理学领域,提出一种基于改进分层注意网络(HAN)与TextCNN(Text Convolutional Neural Network)两通道联合建模的犯罪语义卷积分层注意网络(CCHA-Net),通过分别挖掘犯罪事实与服刑人员基本情况的语义信息,完成暴力犯罪气质分级。首先,采用Focal Loss同时替代两通道中的Cross-Entropy函数,优化样本数量不均衡问题。其次,在两通道输入层中,同时引入位置编码,改进对位置信息的感知能力;改进HAN通道,采用最大池化构建显著向量。最后,输出层都采用全局平均池化替代全连接方法,以避免过拟合。实验结果表明,与AC-BiLSTM(Attention-based Bidirectional Long Short-Term Memory with Convolution layer)、支持向量机(SVM)等17种相关基线模型相比,CCHA-Net各项指标均最优,微平均F1(Micro_F1)为99.57%,宏平均和微平均下的曲线下面积(AUC)分别为99.45%和99.89%,相较于次优的AC-BiLSTM提高了4.08、5.59和0.74个百分点,验证了CCHA-Net能有效胜任暴力犯罪气质分级任务。 展开更多
关键词 深度学习 文本分类 卷积神经网络 分层注意网络 暴力犯罪分级 气质类型
下载PDF
上一页 1 2 147 下一页 到第
使用帮助 返回顶部