期刊文献+
共找到82篇文章
< 1 2 5 >
每页显示 20 50 100
Relational Turkish Text Classification Using Distant Supervised Entities and Relations
1
作者 Halil Ibrahim Okur Kadir Tohma Ahmet Sertbas 《Computers, Materials & Continua》 SCIE EI 2024年第5期2209-2228,共20页
Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved throu... Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved through the integration of entity-relation information obtained from the Wikidata(Wikipedia database)database and BERTbased pre-trained Named Entity Recognition(NER)models.Focusing on a significant challenge in the field of natural language processing(NLP),the research evaluates the potential of using entity and relational information to extract deeper meaning from texts.The adopted methodology encompasses a comprehensive approach that includes text preprocessing,entity detection,and the integration of relational information.Experiments conducted on text datasets in both Turkish and English assess the performance of various classification algorithms,such as Support Vector Machine,Logistic Regression,Deep Neural Network,and Convolutional Neural Network.The results indicate that the integration of entity-relation information can significantly enhance algorithmperformance in text classification tasks and offer new perspectives for information extraction and semantic analysis in NLP applications.Contributions of this work include the utilization of distant supervised entity-relation information in Turkish text classification,the development of a Turkish relational text classification approach,and the creation of a relational database.By demonstrating potential performance improvements through the integration of distant supervised entity-relation information into Turkish text classification,this research aims to support the effectiveness of text-based artificial intelligence(AI)tools.Additionally,it makes significant contributions to the development ofmultilingual text classification systems by adding deeper meaning to text content,thereby providing a valuable addition to current NLP studies and setting an important reference point for future research. 展开更多
关键词 text classification relation extraction NER distant supervision deep learning machine learning
下载PDF
Leveraging Uncertainty for Depth-Aware Hierarchical Text Classification
2
作者 Zixuan Wu Ye Wang +2 位作者 Lifeng Shen Feng Hu Hong Yu 《Computers, Materials & Continua》 SCIE EI 2024年第9期4111-4127,共17页
Hierarchical Text Classification(HTC)aims to match text to hierarchical labels.Existing methods overlook two critical issues:first,some texts cannot be fully matched to leaf node labels and need to be classified to th... Hierarchical Text Classification(HTC)aims to match text to hierarchical labels.Existing methods overlook two critical issues:first,some texts cannot be fully matched to leaf node labels and need to be classified to the correct parent node instead of treating leaf nodes as the final classification target.Second,error propagation occurs when a misclassification at a parent node propagates down the hierarchy,ultimately leading to inaccurate predictions at the leaf nodes.To address these limitations,we propose an uncertainty-guided HTC depth-aware model called DepthMatch.Specifically,we design an early stopping strategy with uncertainty to identify incomplete matching between text and labels,classifying them into the corresponding parent node labels.This approach allows us to dynamically determine the classification depth by leveraging evidence to quantify and accumulate uncertainty.Experimental results show that the proposed DepthMatch outperforms recent strong baselines on four commonly used public datasets:WOS(Web of Science),RCV1-V2(Reuters Corpus Volume I),AAPD(Arxiv Academic Paper Dataset),and BGC.Notably,on the BGC dataset,it improvesMicro-F1 andMacro-F1 scores by at least 1.09%and 1.74%,respectively. 展开更多
关键词 Hierarchical text classification incomplete text-label matching UNCERTAINTY depth-aware early stopping strategy
下载PDF
Ensemble Filter-Wrapper Text Feature Selection Methods for Text Classification
3
作者 Oluwaseun Peter Ige Keng Hoon Gan 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第11期1847-1865,共19页
Feature selection is a crucial technique in text classification for improving the efficiency and effectiveness of classifiers or machine learning techniques by reducing the dataset’s dimensionality.This involves elim... Feature selection is a crucial technique in text classification for improving the efficiency and effectiveness of classifiers or machine learning techniques by reducing the dataset’s dimensionality.This involves eliminating irrelevant,redundant,and noisy features to streamline the classification process.Various methods,from single feature selection techniques to ensemble filter-wrapper methods,have been used in the literature.Metaheuristic algorithms have become popular due to their ability to handle optimization complexity and the continuous influx of text documents.Feature selection is inherently multi-objective,balancing the enhancement of feature relevance,accuracy,and the reduction of redundant features.This research presents a two-fold objective for feature selection.The first objective is to identify the top-ranked features using an ensemble of three multi-univariate filter methods:Information Gain(Infogain),Chi-Square(Chi^(2)),and Analysis of Variance(ANOVA).This aims to maximize feature relevance while minimizing redundancy.The second objective involves reducing the number of selected features and increasing accuracy through a hybrid approach combining Artificial Bee Colony(ABC)and Genetic Algorithms(GA).This hybrid method operates in a wrapper framework to identify the most informative subset of text features.Support Vector Machine(SVM)was employed as the performance evaluator for the proposed model,tested on two high-dimensional multiclass datasets.The experimental results demonstrated that the ensemble filter combined with the ABC+GA hybrid approach is a promising solution for text feature selection,offering superior performance compared to other existing feature selection algorithms. 展开更多
关键词 Metaheuristic algorithms text classification multi-univariate filter feature selection ensemble filter-wrapper techniques
下载PDF
Novel Machine Learning–Based Approach for Arabic Text Classification Using Stylistic and Semantic Features 被引量:1
4
作者 Fethi Fkih Mohammed Alsuhaibani +1 位作者 Delel Rhouma Ali Mustafa Qamar 《Computers, Materials & Continua》 SCIE EI 2023年第6期5871-5886,共16页
Text classification is an essential task for many applications related to the Natural Language Processing domain.It can be applied in many fields,such as Information Retrieval,Knowledge Extraction,and Knowledge modeli... Text classification is an essential task for many applications related to the Natural Language Processing domain.It can be applied in many fields,such as Information Retrieval,Knowledge Extraction,and Knowledge modeling.Even though the importance of this task,Arabic Text Classification tools still suffer from many problems and remain incapable of responding to the increasing volume of Arabic content that circulates on the web or resides in large databases.This paper introduces a novel machine learning-based approach that exclusively uses hybrid(stylistic and semantic)features.First,we clean the Arabic documents and translate them to English using translation tools.Consequently,the semantic features are automatically extracted from the translated documents using an existing database of English topics.Besides,the model automatically extracts from the textual content a set of stylistic features such as word and character frequencies and punctuation.Therefore,we obtain 3 types of features:semantic,stylistic and hybrid.Using each time,a different type of feature,we performed an in-depth comparison study of nine well-known Machine Learning models to evaluate our approach and used a standard Arabic corpus.The obtained results show that Neural Network outperforms other models and provides good performances using hybrid features(F1-score=0.88%). 展开更多
关键词 Arabic text classification machine learning stylistic features semantic features TOPICS
下载PDF
Automated Arabic Text Classification Using Hyperparameter Tuned Hybrid Deep Learning Model
5
作者 Badriyya B.Al-onazi Saud S.Alotaib +4 位作者 Saeed Masoud Alshahrani Najm Alotaibi Mrim M.Alnfiai Ahmed S.Salama Manar Ahmed Hamza 《Computers, Materials & Continua》 SCIE EI 2023年第3期5447-5465,共19页
The text classification process has been extensively investigated in various languages,especially English.Text classification models are vital in several Natural Language Processing(NLP)applications.The Arabic languag... The text classification process has been extensively investigated in various languages,especially English.Text classification models are vital in several Natural Language Processing(NLP)applications.The Arabic language has a lot of significance.For instance,it is the fourth mostly-used language on the internet and the sixth official language of theUnitedNations.However,there are few studies on the text classification process in Arabic.A few text classification studies have been published earlier in the Arabic language.In general,researchers face two challenges in the Arabic text classification process:low accuracy and high dimensionality of the features.In this study,an Automated Arabic Text Classification using Hyperparameter Tuned Hybrid Deep Learning(AATC-HTHDL)model is proposed.The major goal of the proposed AATC-HTHDL method is to identify different class labels for the Arabic text.The first step in the proposed model is to pre-process the input data to transform it into a useful format.The Term Frequency-Inverse Document Frequency(TF-IDF)model is applied to extract the feature vectors.Next,the Convolutional Neural Network with Recurrent Neural Network(CRNN)model is utilized to classify the Arabic text.In the final stage,the Crow Search Algorithm(CSA)is applied to fine-tune the CRNN model’s hyperparameters,showing the work’s novelty.The proposed AATCHTHDL model was experimentally validated under different parameters and the outcomes established the supremacy of the proposed AATC-HTHDL model over other approaches. 展开更多
关键词 Hybrid deep learning natural language processing arabic language text classification parameter tuning
下载PDF
Gate-Attention and Dual-End Enhancement Mechanism for Multi-Label Text Classification
6
作者 Jieren Cheng Xiaolong Chen +3 位作者 Wenghang Xu Shuai Hua Zhu Tang Victor S.Sheng 《Computers, Materials & Continua》 SCIE EI 2023年第11期1779-1793,共15页
In the realm of Multi-Label Text Classification(MLTC),the dual challenges of extracting rich semantic features from text and discerning inter-label relationships have spurred innovative approaches.Many studies in sema... In the realm of Multi-Label Text Classification(MLTC),the dual challenges of extracting rich semantic features from text and discerning inter-label relationships have spurred innovative approaches.Many studies in semantic feature extraction have turned to external knowledge to augment the model’s grasp of textual content,often overlooking intrinsic textual cues such as label statistical features.In contrast,these endogenous insights naturally align with the classification task.In our paper,to complement this focus on intrinsic knowledge,we introduce a novel Gate-Attention mechanism.This mechanism adeptly integrates statistical features from the text itself into the semantic fabric,enhancing the model’s capacity to understand and represent the data.Additionally,to address the intricate task of mining label correlations,we propose a Dual-end enhancement mechanism.This mechanism effectively mitigates the challenges of information loss and erroneous transmission inherent in traditional long short term memory propagation.We conducted an extensive battery of experiments on the AAPD and RCV1-2 datasets.These experiments serve the dual purpose of confirming the efficacy of both the Gate-Attention mechanism and the Dual-end enhancement mechanism.Our final model unequivocally outperforms the baseline model,attesting to its robustness.These findings emphatically underscore the imperativeness of taking into account not just external knowledge but also the inherent intricacies of textual data when crafting potent MLTC models. 展开更多
关键词 Multi-label text classification feature extraction label distribution information sequence generation
下载PDF
Convolutional Deep Belief Network Based Short Text Classification on Arabic Corpus
7
作者 Abdelwahed Motwakel Badriyya B.Al-onazi +5 位作者 Jaber S.Alzahrani Radwa Marzouk Amira Sayed A.Aziz Abu Sarwar Zamani Ishfaq Yaseen Amgad Atta Abdelmageed1 《Computer Systems Science & Engineering》 SCIE EI 2023年第6期3097-3113,共17页
With a population of 440 million,Arabic language users form the rapidly growing language group on the web in terms of the number of Internet users.11 million monthly Twitter users were active and posted nearly 27.4 mi... With a population of 440 million,Arabic language users form the rapidly growing language group on the web in terms of the number of Internet users.11 million monthly Twitter users were active and posted nearly 27.4 million tweets every day.In order to develop a classification system for the Arabic lan-guage there comes a need of understanding the syntactic framework of the words thereby manipulating and representing the words for making their classification effective.In this view,this article introduces a Dolphin Swarm Optimization with Convolutional Deep Belief Network for Short Text Classification(DSOCDBN-STC)model on Arabic Corpus.The presented DSOCDBN-STC model majorly aims to classify Arabic short text in social media.The presented DSOCDBN-STC model encompasses preprocessing and word2vec word embedding at the preliminary stage.Besides,the DSOCDBN-STC model involves CDBN based classification model for Arabic short text.At last,the DSO technique can be exploited for optimal modification of the hyperparameters related to the CDBN method.To establish the enhanced performance of the DSOCDBN-STC model,a wide range of simulations have been performed.The simulation results con-firmed the supremacy of the DSOCDBN-STC model over existing models with improved accuracy of 99.26%. 展开更多
关键词 Arabic text short text classification dolphin swarm optimization deep learning
下载PDF
A Novel Efficient and Effective Preprocessing Algorithm for Text Classification
8
作者 Lijie Zhu Difan Luo 《Journal of Computer and Communications》 2023年第3期1-14,共14页
Text classification is an essential task of natural language processing. Preprocessing, which determines the representation of text features, is one of the key steps of text classification architecture. It proposed a ... Text classification is an essential task of natural language processing. Preprocessing, which determines the representation of text features, is one of the key steps of text classification architecture. It proposed a novel efficient and effective preprocessing algorithm with three methods for text classification combining the Orthogonal Matching Pursuit algorithm to perform the classification. The main idea of the novel preprocessing strategy is that it combined stopword removal and/or regular filtering with tokenization and lowercase conversion, which can effectively reduce the feature dimension and improve the text feature matrix quality. Simulation tests on the 20 newsgroups dataset show that compared with the existing state-of-the-art method, the new method reduces the number of features by 19.85%, 34.35%, 26.25% and 38.67%, improves accuracy by 7.36%, 8.8%, 5.71% and 7.73%, and increases the speed of text classification by 17.38%, 25.64%, 23.76% and 33.38% on the four data, respectively. 展开更多
关键词 text classification PREPROCESSING Feature Dimension Orthogonal Matching Pursuit
下载PDF
Parallel naive Bayes algorithm for large-scale Chinese text classification based on spark 被引量:21
9
作者 LIU Peng ZHAO Hui-han +3 位作者 TENG Jia-yu YANG Yan-yan LIU Ya-feng ZHU Zong-wei 《Journal of Central South University》 SCIE EI CAS CSCD 2019年第1期1-12,共12页
The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parall... The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parallel naive Bayes algorithm(PNBA)for Chinese text classification based on Spark,a parallel memory computing platform for big data.This algorithm has implemented parallel operation throughout the entire training and prediction process of naive Bayes classifier mainly by adopting the programming model of resilient distributed datasets(RDD).For comparison,a PNBA based on Hadoop is also implemented.The test results show that in the same computing environment and for the same text sets,the Spark PNBA is obviously superior to the Hadoop PNBA in terms of key indicators such as speedup ratio and scalability.Therefore,Spark-based parallel algorithms can better meet the requirement of large-scale Chinese text data mining. 展开更多
关键词 Chinese text classification naive Bayes SPARK HADOOP resilient distributed dataset PARALLELIZATION
下载PDF
Review of Text Classification Methods on Deep Learning 被引量:13
10
作者 Hongping Wu Yuling Liu Jingwen Wang 《Computers, Materials & Continua》 SCIE EI 2020年第6期1309-1321,共13页
Text classification has always been an increasingly crucial topic in natural language processing.Traditional text classification methods based on machine learning have many disadvantages such as dimension explosion,da... Text classification has always been an increasingly crucial topic in natural language processing.Traditional text classification methods based on machine learning have many disadvantages such as dimension explosion,data sparsity,limited generalization ability and so on.Based on deep learning text classification,this paper presents an extensive study on the text classification models including Convolutional Neural Network-Based(CNN-Based),Recurrent Neural Network-Based(RNN-based),Attention Mechanisms-Based and so on.Many studies have proved that text classification methods based on deep learning outperform the traditional methods when processing large-scale and complex datasets.The main reasons are text classification methods based on deep learning can avoid cumbersome feature extraction process and have higher prediction accuracy for a large set of unstructured data.In this paper,we also summarize the shortcomings of traditional text classification methods and introduce the text classification process based on deep learning including text preprocessing,distributed representation of text,text classification model construction based on deep learning and performance evaluation. 展开更多
关键词 text classification deep learning distributed representation CNN RNN attention mechanism
下载PDF
An improved TF-IDF approach for text classification 被引量:5
11
作者 张云涛 龚玲 王永成 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2005年第1期49-55,共7页
This paper presents a new improved term frequency/inverse document frequency (TF-IDF) approach which uses confidence, support and characteristic words to enhance the recall and precision of text classification. Synony... This paper presents a new improved term frequency/inverse document frequency (TF-IDF) approach which uses confidence, support and characteristic words to enhance the recall and precision of text classification. Synonyms defined by a lexicon are processed in the improved TF-IDF approach. We detailedly discuss and analyze the relationship among confidence, recall and precision. The experiments based on science and technology gave promising results that the new TF-IDF approach improves the precision and recall of text classification compared with the conventional TF-IDF approach. 展开更多
关键词 Term frequency/inverse document frequency (TF-IDF) text classification CONFIDENCE SUPPORT Characteristic words
下载PDF
MII:A Novel Text Classification Model Combining Deep Active Learning with BERT 被引量:6
12
作者 Anman Zhang Bohan Li +2 位作者 Wenhuan Wang Shuo Wan Weitong Chen 《Computers, Materials & Continua》 SCIE EI 2020年第6期1499-1514,共16页
Active learning has been widely utilized to reduce the labeling cost of supervised learning.By selecting specific instances to train the model,the performance of the model was improved within limited steps.However,rar... Active learning has been widely utilized to reduce the labeling cost of supervised learning.By selecting specific instances to train the model,the performance of the model was improved within limited steps.However,rare work paid attention to the effectiveness of active learning on it.In this paper,we proposed a deep active learning model with bidirectional encoder representations from transformers(BERT)for text classification.BERT takes advantage of the self-attention mechanism to integrate contextual information,which is beneficial to accelerate the convergence of training.As for the process of active learning,we design an instance selection strategy based on posterior probabilities Margin,Intra-correlation and Inter-correlation(MII).Selected instances are characterized by small margin,low intra-cohesion and high inter-cohesion.We conduct extensive experiments and analytics with our methods.The effect of learner is compared while the effect of sampling strategy and text classification is assessed from three real datasets.The results show that our method outperforms the baselines in terms of accuracy. 展开更多
关键词 Active learning instance selection deep neural network text classification
下载PDF
Dimensionality Reduction by Mutual Information for Text Classification 被引量:2
13
作者 刘丽珍 宋瀚涛 陆玉昌 《Journal of Beijing Institute of Technology》 EI CAS 2005年第1期32-36,共5页
The frame of text classification system was presented. The high dimensionality in feature space for text classification was studied. The mutual information is a widely used information theoretic measure, in a descript... The frame of text classification system was presented. The high dimensionality in feature space for text classification was studied. The mutual information is a widely used information theoretic measure, in a descriptive way, to measure the stochastic dependency of discrete random variables. The measure method was used as a criterion to reduce high dimensionality of feature vectors in text classification on Web. Feature selections or conversions were performed by using maximum mutual information including linear and non-linear feature conversions. Entropy was used and extended to find right features commendably in pattern recognition systems. Favorable foundation would be established for text classification mining. 展开更多
关键词 text classification mutual information dimensionality reduction
下载PDF
Long Text Classification Algorithm Using a Hybrid Model of Bidirectional Encoder Representation from Transformers-Hierarchical Attention Networks-Dilated Convolutions Network 被引量:1
14
作者 ZHAO Yuanyuan GAO Shining +1 位作者 LIU Yang GONG Xiaohui 《Journal of Donghua University(English Edition)》 CAS 2021年第4期341-350,共10页
Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid mo... Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid model of bidirectional encoder representation from transformers-hierarchical attention networks-dilated convolutions networks(BERT_HAN_DCN)which based on BERT pre-trained model with superior ability of extracting characteristic.The advantages of HAN model and DCN model are taken into account which can help gain abundant semantic information,fusing context semantic features and hierarchical characteristics.Secondly,the traditional softmax algorithm increases the learning difficulty of the same kind of samples,making it more difficult to distinguish similar features.Based on this,AM-softmax is introduced to replace the traditional softmax.Finally,the fused model is validated,which shows superior performance in the accuracy rate and F1-score of this hybrid model on two datasets and the experimental analysis shows the general single models such as HAN,DCN,based on BERT pre-trained model.Besides,the improved AM-softmax network model is superior to the general softmax network model. 展开更多
关键词 long text classification dilated convolution BERT fusing context semantic features hierarchical characteristics BERT_HAN_DCN AM-softmax
下载PDF
Feature selection algorithm for text classification based on improved mutual information 被引量:1
15
作者 丛帅 张积宾 +1 位作者 徐志明 王宇颖 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2011年第3期144-148,共5页
In order to solve the poor performance in text classification when using traditional formula of mutual information (MI) , a feature selection algorithm were proposed based on improved mutual information. The improve... In order to solve the poor performance in text classification when using traditional formula of mutual information (MI) , a feature selection algorithm were proposed based on improved mutual information. The improved mutual information algorithm, which is on the basis of traditional improved mutual information methods that enbance the MI value of negative characteristics and feature' s frequency, supports the concept of concentration degree and dispersion degree. In accordance with the concept of concentration degree and dispersion degree, formulas which embody concentration degree and dispersion degree were constructed and the improved mutual information was implemented based on these. In this paper, the feature selection algorithm was applied based on improved mutual information to a text classifier based on Biomimetic Pattern Recognition and it was compared with several other feature selection methods. The experimental results showed that the improved mutu- al information feature selection method greatly enhances the performance compared with traditional mutual information feature selection methods and the performance is better than that of information gain. Through the introduction of the concept of concentration degree and dispersion degree, the improved mutual information feature selection method greatly improves the performance of text classification system. 展开更多
关键词 text classification feature selection improved mutual information: Biomimetie Pattern Recognition
下载PDF
Chinese News Text Classification Based on Convolutional Neural Network 被引量:1
16
作者 Hanxu Wang Xin Li 《Journal on Big Data》 2022年第1期41-60,共20页
With the explosive growth of Internet text information,the task of text classification is more important.As a part of text classification,Chinese news text classification also plays an important role.In public securit... With the explosive growth of Internet text information,the task of text classification is more important.As a part of text classification,Chinese news text classification also plays an important role.In public security work,public opinion news classification is an important topic.Effective and accurate classification of public opinion news is a necessary prerequisite for relevant departments to grasp the situation of public opinion and control the trend of public opinion in time.This paper introduces a combinedconvolutional neural network text classification model based on word2vec and improved TF-IDF:firstly,the word vector is trained through word2vec model,then the weight of each word is calculated by using the improved TFIDF algorithm based on class frequency variance,and the word vector and weight are combined to construct the text vector representation.Finally,the combined-convolutional neural network is used to train and test the Thucnews data set.The results show that the classification effect of this model is better than the traditional Text-RNN model,the traditional Text-CNN model and word2vec-CNN model.The test accuracy is 97.56%,the accuracy rate is 97%,the recall rate is 97%,and the F1-score is 97%. 展开更多
关键词 Chinese news text classification word2vec model improved TF-IDF combined-convolutional neural network public opinion news
下载PDF
An Effective Concept Extraction Method for Improving Text Classification Performance
17
作者 ZHANGYuntao GONGLing +1 位作者 WANGYongcheng YINZhonghang 《Geo-Spatial Information Science》 2003年第4期66-72,共7页
This paper presents anew way to extract concept that can beused to improve text classification per-formance (precision and recall). Thecomputational measure will be dividedinto two layers. The bottom layercalled docum... This paper presents anew way to extract concept that can beused to improve text classification per-formance (precision and recall). Thecomputational measure will be dividedinto two layers. The bottom layercalled document layer is concernedwith extracting the concepts of parti-cular document and the upper layercalled category layer is with findingthe description and subject concepts ofparticular category. The relevant im-plementation algorithm that dramatic-ally decreases the search space is dis-cussed in detail. The experiment basedon real-world data collected from Info-Bank shows that the approach is supe-rior to the traditional ones. 展开更多
关键词 text classification concept extraction characteristic term associationrule ALGORITHM
下载PDF
A New Model for Automatic Text Classification
18
作者 Hekmatullah Mumivand Rasool Seidi Piri Fatemeh Kheiraei 《Electrical Science & Engineering》 2021年第1期10-15,共6页
In this paper,a new method for automatic classification of texts is present­ed.This system includes two phases;text processing and text categoriza­tion.In the first phase,various indexing criteria such as bi... In this paper,a new method for automatic classification of texts is present­ed.This system includes two phases;text processing and text categoriza­tion.In the first phase,various indexing criteria such as bigram,trigram and quad-gram are presented to extract the properties.Then,in the second phase,the W-SMO machine learning algorithm is used to train the system.In order to evaluate and compare the results of the two criteria of accuracy and readability,Macro-F1 and Micro-F1 have been calculated for different indexing methods.The results of experiments performed on 7676 standard text documents of Reuters showed that the best performance is related to w-smo bigram criteria with accuracy of 95.17 micro and 79.86 macro.Also,the results indicated that our proposed method has the best performance compared to the W-j48,Naïve Bayes,K-NN and Decision Tree algorithms. 展开更多
关键词 text classification Machine learning W-SMO N-GRAM
下载PDF
Application of the probability-based covering algorithm model in text classification
19
作者 ZHOU Ying 《Chinese Journal of Library and Information Science》 2009年第4期1-17,共17页
The probability-based covering algorithm(PBCA) is a new algorithm based on probability distribution. It decides, by voting, the class of the tested samples on the border of the coverage area, based on the probability ... The probability-based covering algorithm(PBCA) is a new algorithm based on probability distribution. It decides, by voting, the class of the tested samples on the border of the coverage area, based on the probability of training samples. When using the original covering algorithm(CA), many tested samples that are located on the border of the coverage cannot be classified by the spherical neighborhood gained. The network structure of PBCA is a mixed structure composed of both a feed-forward network and a feedback network. By using this method of adding some heterogeneous samples and enlarging the coverage radius,it is possible to decrease the number of rejected samples and improve the rate of recognition accuracy. Relevant computer experiments indicate that the algorithm improves the study precision and achieves reasonably good results in text classification. 展开更多
关键词 Probability-based covering algorithm Structural training algorithm PROBABILITY text classification
下载PDF
Naïve Bayes Algorithm for Large Scale Text Classification
20
作者 Pirunthavi SIVAKUMAR Jayalath EKANAYAKE 《Instrumentation》 2021年第4期55-62,共8页
This paper proposed an improved Naïve Bayes Classifier for sentimental analysis from a large-scale dataset such as in YouTube.YouTube contains large unstructured and unorganized comments and reactions,which carry... This paper proposed an improved Naïve Bayes Classifier for sentimental analysis from a large-scale dataset such as in YouTube.YouTube contains large unstructured and unorganized comments and reactions,which carry important information.Organizing large amounts of data and extracting useful information is a challenging task.The extracted information can be considered as new knowledge and can be used for deci sion-making.We extract comments from YouTube on videos and categorized them in domain-specific,and then apply the Naïve Bayes classifier with improved techniques.Our method provided a decent 80%accuracy in classifying those comments.This experiment shows that the proposed method provides excellent adaptability for large-scale text classification. 展开更多
关键词 Naïve Bayes text classification YOUTUBE Sentimental Analysis
下载PDF
上一页 1 2 5 下一页 到第
使用帮助 返回顶部