期刊文献+
共找到72篇文章
< 1 2 4 >
每页显示 20 50 100
Relational Turkish Text Classification Using Distant Supervised Entities and Relations
1
作者 Halil Ibrahim Okur Kadir Tohma Ahmet Sertbas 《Computers, Materials & Continua》 SCIE EI 2024年第5期2209-2228,共20页
Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved throu... Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved through the integration of entity-relation information obtained from the Wikidata(Wikipedia database)database and BERTbased pre-trained Named Entity Recognition(NER)models.Focusing on a significant challenge in the field of natural language processing(NLP),the research evaluates the potential of using entity and relational information to extract deeper meaning from texts.The adopted methodology encompasses a comprehensive approach that includes text preprocessing,entity detection,and the integration of relational information.Experiments conducted on text datasets in both Turkish and English assess the performance of various classification algorithms,such as Support Vector Machine,Logistic Regression,Deep Neural Network,and Convolutional Neural Network.The results indicate that the integration of entity-relation information can significantly enhance algorithmperformance in text classification tasks and offer new perspectives for information extraction and semantic analysis in NLP applications.Contributions of this work include the utilization of distant supervised entity-relation information in Turkish text classification,the development of a Turkish relational text classification approach,and the creation of a relational database.By demonstrating potential performance improvements through the integration of distant supervised entity-relation information into Turkish text classification,this research aims to support the effectiveness of text-based artificial intelligence(AI)tools.Additionally,it makes significant contributions to the development ofmultilingual text classification systems by adding deeper meaning to text content,thereby providing a valuable addition to current NLP studies and setting an important reference point for future research. 展开更多
关键词 text classification relation extraction NER distant supervision deep learning machine learning
下载PDF
Novel Machine Learning–Based Approach for Arabic Text Classification Using Stylistic and Semantic Features 被引量:1
2
作者 Fethi Fkih Mohammed Alsuhaibani +1 位作者 Delel Rhouma Ali Mustafa Qamar 《Computers, Materials & Continua》 SCIE EI 2023年第6期5871-5886,共16页
Text classification is an essential task for many applications related to the Natural Language Processing domain.It can be applied in many fields,such as Information Retrieval,Knowledge Extraction,and Knowledge modeli... Text classification is an essential task for many applications related to the Natural Language Processing domain.It can be applied in many fields,such as Information Retrieval,Knowledge Extraction,and Knowledge modeling.Even though the importance of this task,Arabic Text Classification tools still suffer from many problems and remain incapable of responding to the increasing volume of Arabic content that circulates on the web or resides in large databases.This paper introduces a novel machine learning-based approach that exclusively uses hybrid(stylistic and semantic)features.First,we clean the Arabic documents and translate them to English using translation tools.Consequently,the semantic features are automatically extracted from the translated documents using an existing database of English topics.Besides,the model automatically extracts from the textual content a set of stylistic features such as word and character frequencies and punctuation.Therefore,we obtain 3 types of features:semantic,stylistic and hybrid.Using each time,a different type of feature,we performed an in-depth comparison study of nine well-known Machine Learning models to evaluate our approach and used a standard Arabic corpus.The obtained results show that Neural Network outperforms other models and provides good performances using hybrid features(F1-score=0.88%). 展开更多
关键词 Arabic text classification machine learning stylistic features semantic features TOPICS
下载PDF
Automated Arabic Text Classification Using Hyperparameter Tuned Hybrid Deep Learning Model
3
作者 Badriyya B.Al-onazi Saud S.Alotaib +4 位作者 Saeed Masoud Alshahrani Najm Alotaibi Mrim M.Alnfiai Ahmed S.Salama Manar Ahmed Hamza 《Computers, Materials & Continua》 SCIE EI 2023年第3期5447-5465,共19页
The text classification process has been extensively investigated in various languages,especially English.Text classification models are vital in several Natural Language Processing(NLP)applications.The Arabic languag... The text classification process has been extensively investigated in various languages,especially English.Text classification models are vital in several Natural Language Processing(NLP)applications.The Arabic language has a lot of significance.For instance,it is the fourth mostly-used language on the internet and the sixth official language of theUnitedNations.However,there are few studies on the text classification process in Arabic.A few text classification studies have been published earlier in the Arabic language.In general,researchers face two challenges in the Arabic text classification process:low accuracy and high dimensionality of the features.In this study,an Automated Arabic Text Classification using Hyperparameter Tuned Hybrid Deep Learning(AATC-HTHDL)model is proposed.The major goal of the proposed AATC-HTHDL method is to identify different class labels for the Arabic text.The first step in the proposed model is to pre-process the input data to transform it into a useful format.The Term Frequency-Inverse Document Frequency(TF-IDF)model is applied to extract the feature vectors.Next,the Convolutional Neural Network with Recurrent Neural Network(CRNN)model is utilized to classify the Arabic text.In the final stage,the Crow Search Algorithm(CSA)is applied to fine-tune the CRNN model’s hyperparameters,showing the work’s novelty.The proposed AATCHTHDL model was experimentally validated under different parameters and the outcomes established the supremacy of the proposed AATC-HTHDL model over other approaches. 展开更多
关键词 Hybrid deep learning natural language processing arabic language text classification parameter tuning
下载PDF
Gate-Attention and Dual-End Enhancement Mechanism for Multi-Label Text Classification
4
作者 Jieren Cheng Xiaolong Chen +3 位作者 Wenghang Xu Shuai Hua Zhu Tang Victor S.Sheng 《Computers, Materials & Continua》 SCIE EI 2023年第11期1779-1793,共15页
In the realm of Multi-Label Text Classification(MLTC),the dual challenges of extracting rich semantic features from text and discerning inter-label relationships have spurred innovative approaches.Many studies in sema... In the realm of Multi-Label Text Classification(MLTC),the dual challenges of extracting rich semantic features from text and discerning inter-label relationships have spurred innovative approaches.Many studies in semantic feature extraction have turned to external knowledge to augment the model’s grasp of textual content,often overlooking intrinsic textual cues such as label statistical features.In contrast,these endogenous insights naturally align with the classification task.In our paper,to complement this focus on intrinsic knowledge,we introduce a novel Gate-Attention mechanism.This mechanism adeptly integrates statistical features from the text itself into the semantic fabric,enhancing the model’s capacity to understand and represent the data.Additionally,to address the intricate task of mining label correlations,we propose a Dual-end enhancement mechanism.This mechanism effectively mitigates the challenges of information loss and erroneous transmission inherent in traditional long short term memory propagation.We conducted an extensive battery of experiments on the AAPD and RCV1-2 datasets.These experiments serve the dual purpose of confirming the efficacy of both the Gate-Attention mechanism and the Dual-end enhancement mechanism.Our final model unequivocally outperforms the baseline model,attesting to its robustness.These findings emphatically underscore the imperativeness of taking into account not just external knowledge but also the inherent intricacies of textual data when crafting potent MLTC models. 展开更多
关键词 Multi-label text classification feature extraction label distribution information sequence generation
下载PDF
Convolutional Deep Belief Network Based Short Text Classification on Arabic Corpus
5
作者 Abdelwahed Motwakel Badriyya B.Al-onazi +5 位作者 Jaber S.Alzahrani Radwa Marzouk Amira Sayed A.Aziz Abu Sarwar Zamani Ishfaq Yaseen Amgad Atta Abdelmageed1 《Computer Systems Science & Engineering》 SCIE EI 2023年第6期3097-3113,共17页
With a population of 440 million,Arabic language users form the rapidly growing language group on the web in terms of the number of Internet users.11 million monthly Twitter users were active and posted nearly 27.4 mi... With a population of 440 million,Arabic language users form the rapidly growing language group on the web in terms of the number of Internet users.11 million monthly Twitter users were active and posted nearly 27.4 million tweets every day.In order to develop a classification system for the Arabic lan-guage there comes a need of understanding the syntactic framework of the words thereby manipulating and representing the words for making their classification effective.In this view,this article introduces a Dolphin Swarm Optimization with Convolutional Deep Belief Network for Short Text Classification(DSOCDBN-STC)model on Arabic Corpus.The presented DSOCDBN-STC model majorly aims to classify Arabic short text in social media.The presented DSOCDBN-STC model encompasses preprocessing and word2vec word embedding at the preliminary stage.Besides,the DSOCDBN-STC model involves CDBN based classification model for Arabic short text.At last,the DSO technique can be exploited for optimal modification of the hyperparameters related to the CDBN method.To establish the enhanced performance of the DSOCDBN-STC model,a wide range of simulations have been performed.The simulation results con-firmed the supremacy of the DSOCDBN-STC model over existing models with improved accuracy of 99.26%. 展开更多
关键词 Arabic text short text classification dolphin swarm optimization deep learning
下载PDF
A Novel Efficient and Effective Preprocessing Algorithm for Text Classification
6
作者 Lijie Zhu Difan Luo 《Journal of Computer and Communications》 2023年第3期1-14,共14页
Text classification is an essential task of natural language processing. Preprocessing, which determines the representation of text features, is one of the key steps of text classification architecture. It proposed a ... Text classification is an essential task of natural language processing. Preprocessing, which determines the representation of text features, is one of the key steps of text classification architecture. It proposed a novel efficient and effective preprocessing algorithm with three methods for text classification combining the Orthogonal Matching Pursuit algorithm to perform the classification. The main idea of the novel preprocessing strategy is that it combined stopword removal and/or regular filtering with tokenization and lowercase conversion, which can effectively reduce the feature dimension and improve the text feature matrix quality. Simulation tests on the 20 newsgroups dataset show that compared with the existing state-of-the-art method, the new method reduces the number of features by 19.85%, 34.35%, 26.25% and 38.67%, improves accuracy by 7.36%, 8.8%, 5.71% and 7.73%, and increases the speed of text classification by 17.38%, 25.64%, 23.76% and 33.38% on the four data, respectively. 展开更多
关键词 text classification PREPROCESSING Feature Dimension Orthogonal Matching Pursuit
下载PDF
Review of Text Classification Methods on Deep Learning 被引量:11
7
作者 Hongping Wu Yuling Liu Jingwen Wang 《Computers, Materials & Continua》 SCIE EI 2020年第6期1309-1321,共13页
Text classification has always been an increasingly crucial topic in natural language processing.Traditional text classification methods based on machine learning have many disadvantages such as dimension explosion,da... Text classification has always been an increasingly crucial topic in natural language processing.Traditional text classification methods based on machine learning have many disadvantages such as dimension explosion,data sparsity,limited generalization ability and so on.Based on deep learning text classification,this paper presents an extensive study on the text classification models including Convolutional Neural Network-Based(CNN-Based),Recurrent Neural Network-Based(RNN-based),Attention Mechanisms-Based and so on.Many studies have proved that text classification methods based on deep learning outperform the traditional methods when processing large-scale and complex datasets.The main reasons are text classification methods based on deep learning can avoid cumbersome feature extraction process and have higher prediction accuracy for a large set of unstructured data.In this paper,we also summarize the shortcomings of traditional text classification methods and introduce the text classification process based on deep learning including text preprocessing,distributed representation of text,text classification model construction based on deep learning and performance evaluation. 展开更多
关键词 text classification deep learning distributed representation CNN RNN attention mechanism
下载PDF
MII:A Novel Text Classification Model Combining Deep Active Learning with BERT 被引量:6
8
作者 Anman Zhang Bohan Li +2 位作者 Wenhuan Wang Shuo Wan Weitong Chen 《Computers, Materials & Continua》 SCIE EI 2020年第6期1499-1514,共16页
Active learning has been widely utilized to reduce the labeling cost of supervised learning.By selecting specific instances to train the model,the performance of the model was improved within limited steps.However,rar... Active learning has been widely utilized to reduce the labeling cost of supervised learning.By selecting specific instances to train the model,the performance of the model was improved within limited steps.However,rare work paid attention to the effectiveness of active learning on it.In this paper,we proposed a deep active learning model with bidirectional encoder representations from transformers(BERT)for text classification.BERT takes advantage of the self-attention mechanism to integrate contextual information,which is beneficial to accelerate the convergence of training.As for the process of active learning,we design an instance selection strategy based on posterior probabilities Margin,Intra-correlation and Inter-correlation(MII).Selected instances are characterized by small margin,low intra-cohesion and high inter-cohesion.We conduct extensive experiments and analytics with our methods.The effect of learner is compared while the effect of sampling strategy and text classification is assessed from three real datasets.The results show that our method outperforms the baselines in terms of accuracy. 展开更多
关键词 Active learning instance selection deep neural network text classification
下载PDF
Dimensionality Reduction by Mutual Information for Text Classification 被引量:2
9
作者 刘丽珍 宋瀚涛 陆玉昌 《Journal of Beijing Institute of Technology》 EI CAS 2005年第1期32-36,共5页
The frame of text classification system was presented. The high dimensionality in feature space for text classification was studied. The mutual information is a widely used information theoretic measure, in a descript... The frame of text classification system was presented. The high dimensionality in feature space for text classification was studied. The mutual information is a widely used information theoretic measure, in a descriptive way, to measure the stochastic dependency of discrete random variables. The measure method was used as a criterion to reduce high dimensionality of feature vectors in text classification on Web. Feature selections or conversions were performed by using maximum mutual information including linear and non-linear feature conversions. Entropy was used and extended to find right features commendably in pattern recognition systems. Favorable foundation would be established for text classification mining. 展开更多
关键词 text classification mutual information dimensionality reduction
下载PDF
Feature selection algorithm for text classification based on improved mutual information 被引量:1
10
作者 丛帅 张积宾 +1 位作者 徐志明 王宇颖 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2011年第3期144-148,共5页
In order to solve the poor performance in text classification when using traditional formula of mutual information (MI),a feature selection algorithm were proposed based on improved mutual information.The improved mut... In order to solve the poor performance in text classification when using traditional formula of mutual information (MI),a feature selection algorithm were proposed based on improved mutual information.The improved mutual information algorithm,which is on the basis of traditional improved mutual information methods that enhance the MI value of negative characteristics and feature's frequency,supports the concept of concentration degree and dispersion degree.In accordance with the concept of concentration degree and dispersion degree,formulas which embody concentration degree and dispersion degree were constructed and the improved mutual information was implemented based on these.In this paper,the feature selection algorithm was applied based on improved mutual information to a text classifier based on Biomimetic Pattern Recognition and it was compared with several other feature selection methods.The experimental results showed that the improved mutual information feature selection method greatly enhances the performance compared with traditional mutual information feature selection methods and the performance is better than that of information gain.Through the introduction of the concept of concentration degree and dispersion degree,the improved mutual information feature selection method greatly improves the performance of text classification system. 展开更多
关键词 text classification feature selection improved mutual information Biomimetic Pattern Recognition
下载PDF
Long Text Classification Algorithm Using a Hybrid Model of Bidirectional Encoder Representation from Transformers-Hierarchical Attention Networks-Dilated Convolutions Network 被引量:1
11
作者 赵媛媛 高世宁 +1 位作者 刘洋 宫晓蕙 《Journal of Donghua University(English Edition)》 CAS 2021年第4期341-350,共10页
Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid mo... Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid model of bidirectional encoder representation from transformers-hierarchical attention networks-dilated convolutions networks(BERT_HAN_DCN)which based on BERT pre-trained model with superior ability of extracting characteristic.The advantages of HAN model and DCN model are taken into account which can help gain abundant semantic information,fusing context semantic features and hierarchical characteristics.Secondly,the traditional softmax algorithm increases the learning difficulty of the same kind of samples,making it more difficult to distinguish similar features.Based on this,AM-softmax is introduced to replace the traditional softmax.Finally,the fused model is validated,which shows superior performance in the accuracy rate and F1-score of this hybrid model on two datasets and the experimental analysis shows the general single models such as HAN,DCN,based on BERT pre-trained model.Besides,the improved AM-softmax network model is superior to the general softmax network model. 展开更多
关键词 long text classification dilated convolution BERT fusing context semantic features hierarchical characteristics BERT_HAN_DCN AM-softmax
下载PDF
Chinese News Text Classification Based on Convolutional Neural Network 被引量:1
12
作者 Hanxu Wang Xin Li 《Journal on Big Data》 2022年第1期41-60,共20页
With the explosive growth of Internet text information,the task of text classification is more important.As a part of text classification,Chinese news text classification also plays an important role.In public securit... With the explosive growth of Internet text information,the task of text classification is more important.As a part of text classification,Chinese news text classification also plays an important role.In public security work,public opinion news classification is an important topic.Effective and accurate classification of public opinion news is a necessary prerequisite for relevant departments to grasp the situation of public opinion and control the trend of public opinion in time.This paper introduces a combinedconvolutional neural network text classification model based on word2vec and improved TF-IDF:firstly,the word vector is trained through word2vec model,then the weight of each word is calculated by using the improved TFIDF algorithm based on class frequency variance,and the word vector and weight are combined to construct the text vector representation.Finally,the combined-convolutional neural network is used to train and test the Thucnews data set.The results show that the classification effect of this model is better than the traditional Text-RNN model,the traditional Text-CNN model and word2vec-CNN model.The test accuracy is 97.56%,the accuracy rate is 97%,the recall rate is 97%,and the F1-score is 97%. 展开更多
关键词 Chinese news text classification word2vec model improved TF-IDF combined-convolutional neural network public opinion news
下载PDF
A New Model for Automatic Text Classification
13
作者 Hekmatullah Mumivand Rasool Seidi Piri Fatemeh Kheiraei 《Electrical Science & Engineering》 2021年第1期10-15,共6页
In this paper,a new method for automatic classification of texts is present­ed.This system includes two phases;text processing and text categoriza­tion.In the first phase,various indexing criteria such as bi... In this paper,a new method for automatic classification of texts is present­ed.This system includes two phases;text processing and text categoriza­tion.In the first phase,various indexing criteria such as bigram,trigram and quad-gram are presented to extract the properties.Then,in the second phase,the W-SMO machine learning algorithm is used to train the system.In order to evaluate and compare the results of the two criteria of accuracy and readability,Macro-F1 and Micro-F1 have been calculated for different indexing methods.The results of experiments performed on 7676 standard text documents of Reuters showed that the best performance is related to w-smo bigram criteria with accuracy of 95.17 micro and 79.86 macro.Also,the results indicated that our proposed method has the best performance compared to the W-j48,Naïve Bayes,K-NN and Decision Tree algorithms. 展开更多
关键词 text classification Machine learning W-SMO N-GRAM
下载PDF
Application of the probability-based covering algorithm model in text classification
14
作者 ZHOU Ying 《Chinese Journal of Library and Information Science》 2009年第4期1-17,共17页
The probability-based covering algorithm(PBCA) is a new algorithm based on probability distribution. It decides, by voting, the class of the tested samples on the border of the coverage area, based on the probability ... The probability-based covering algorithm(PBCA) is a new algorithm based on probability distribution. It decides, by voting, the class of the tested samples on the border of the coverage area, based on the probability of training samples. When using the original covering algorithm(CA), many tested samples that are located on the border of the coverage cannot be classified by the spherical neighborhood gained. The network structure of PBCA is a mixed structure composed of both a feed-forward network and a feedback network. By using this method of adding some heterogeneous samples and enlarging the coverage radius,it is possible to decrease the number of rejected samples and improve the rate of recognition accuracy. Relevant computer experiments indicate that the algorithm improves the study precision and achieves reasonably good results in text classification. 展开更多
关键词 Probability-based covering algorithm Structural training algorithm PROBABILITY text classification
下载PDF
Naïve Bayes Algorithm for Large Scale Text Classification
15
作者 Pirunthavi SIVAKUMAR Jayalath EKANAYAKE 《Instrumentation》 2021年第4期55-62,共8页
This paper proposed an improved Naïve Bayes Classifier for sentimental analysis from a large-scale dataset such as in YouTube.YouTube contains large unstructured and unorganized comments and reactions,which carry... This paper proposed an improved Naïve Bayes Classifier for sentimental analysis from a large-scale dataset such as in YouTube.YouTube contains large unstructured and unorganized comments and reactions,which carry important information.Organizing large amounts of data and extracting useful information is a challenging task.The extracted information can be considered as new knowledge and can be used for deci sion-making.We extract comments from YouTube on videos and categorized them in domain-specific,and then apply the Naïve Bayes classifier with improved techniques.Our method provided a decent 80%accuracy in classifying those comments.This experiment shows that the proposed method provides excellent adaptability for large-scale text classification. 展开更多
关键词 Naïve Bayes text classification YOUTUBE Sentimental Analysis
下载PDF
Adapter Based on Pre-Trained Language Models for Classification of Medical Text
16
作者 Quan Li 《Journal of Electronic Research and Application》 2024年第3期129-134,共6页
We present an approach to classify medical text at a sentence level automatically.Given the inherent complexity of medical text classification,we employ adapters based on pre-trained language models to extract informa... We present an approach to classify medical text at a sentence level automatically.Given the inherent complexity of medical text classification,we employ adapters based on pre-trained language models to extract information from medical text,facilitating more accurate classification while minimizing the number of trainable parameters.Extensive experiments conducted on various datasets demonstrate the effectiveness of our approach. 展开更多
关键词 classification of medical text ADAPTER Pre-trained language model
下载PDF
Supervised Contrastive Learning with Term Weighting for Improving Chinese Text Classification
17
作者 Jiabao Guo Bo Zhao +2 位作者 Hui Liu Yifan Liu Qian Zhong 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2023年第1期59-68,共10页
With the rapid growth of information retrieval technology,Chinese text classification,which is the basis of information content security,has become a widely discussed topic.In view of the huge difference compared with... With the rapid growth of information retrieval technology,Chinese text classification,which is the basis of information content security,has become a widely discussed topic.In view of the huge difference compared with English,Chinese text task is more complex in semantic information representations.However,most existing Chinese text classification approaches typically regard feature representation and feature selection as the key points,but fail to take into account the learning strategy that adapts to the task.Besides,these approaches compress the Chinese word into a representation vector,without considering the distribution of the term among the categories of interest.In order to improve the effect of Chinese text classification,a unified method,called Supervised Contrastive Learning with Term Weighting(SCL-TW),is proposed in this paper.Supervised contrastive learning makes full use of a large amount of unlabeled data to improve model stability.In SCL-TW,we calculate the score of term weighting to optimize the process of data augmentation of Chinese text.Subsequently,the transformed features are fed into a temporal convolution network to conduct feature representation.Experimental verifications are conducted on two Chinese benchmark datasets.The results demonstrate that SCL-TW outperforms other advanced Chinese text classification approaches by an amazing margin. 展开更多
关键词 Chinese text classification Supervised Contrastive Learning(SCL) Term Weighting(TW) Temporal Convolution Network(TCN)
原文传递
Identifying multidisciplinary problems from scientific publications based on a text generation method
18
作者 Ziyan Xu Hongqi Han +2 位作者 Linna Li Junsheng Zhang Zexu Zhou 《Journal of Data and Information Science》 CSCD 2024年第3期213-237,共25页
Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the... Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the research objective types and disciplinary labels of papers using a text classification technique;second,it generates abstractive titles for each paper based on abstract and research objective types using a generative pre-trained language model;third,it extracts problem phrases from generated titles according to regular expression rules;fourth,it creates problem relation networks and identifies the same problems by exploiting a weighted community detection algorithm;finally,it identifies multidisciplinary problems based on the disciplinary labels of papers.Findings:Experiments in the“Carbon Peaking and Carbon Neutrality”field show that the proposed method can effectively identify multidisciplinary research problems.The disciplinary distribution of the identified problems is consistent with our understanding of multidisciplinary collaboration in the field.Research limitations:It is necessary to use the proposed method in other multidisciplinary fields to validate its effectiveness.Practical implications:Multidisciplinary problem identification helps to gather multidisciplinary forces to solve complex real-world problems for the governments,fund valuable multidisciplinary problems for research management authorities,and borrow ideas from other disciplines for researchers.Originality/value:This approach proposes a novel multidisciplinary problem identification method based on text generation,which identifies multidisciplinary problems based on generative abstractive titles of papers without data annotation required by standard sequence labeling techniques. 展开更多
关键词 Problem identification MULTIDISCIPLINARY text generation text classification
下载PDF
Smart Approaches to Efficient Text Mining for Categorizing Sexual Reproductive Health Short Messages into Key Themes
19
作者 Tobias Makai Mayumbo Nyirenda 《Open Journal of Applied Sciences》 2024年第2期511-532,共22页
To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved a... To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved access to information on various Sexual Reproductive Health topics through Short Messaging Service (SMS) messages. Over the years, the platform has accumulated millions of incoming and outgoing messages, which need to be categorized into key thematic areas for better tracking of sexual reproductive health knowledge gaps among young people. The current manual categorization process of these text messages is inefficient and time-consuming and this study aims to automate the process for improved analysis using text-mining techniques. Firstly, the study investigates the current text message categorization process and identifies a list of categories adopted by counselors over time which are then used to build and train a categorization model. Secondly, the study presents a proof of concept tool that automates the categorization of U-report messages into key thematic areas using the developed categorization model. Finally, it compares the performance and effectiveness of the developed proof of concept tool against the manual system. The study used a dataset comprising 206,625 text messages. The current process would take roughly 2.82 years to categorise this dataset whereas the trained SVM model would require only 6.4 minutes while achieving an accuracy of 70.4% demonstrating that the automated method is significantly faster, more scalable, and consistent when compared to the current manual categorization. These advantages make the SVM model a more efficient and effective tool for categorizing large unstructured text datasets. These results and the proof-of-concept tool developed demonstrate the potential for enhancing the efficiency and accuracy of message categorization on the Zambia U-report platform and other similar text messages-based platforms. 展开更多
关键词 Knowledge Discovery in text (KDT) Sexual Reproductive Health (SRH) text Categorization text classification text Extraction text Mining Feature Extraction Automated classification Process Performance Stemming and Lemmatization Natural Language Processing (NLP)
下载PDF
Higher-Order Smoothing: A Novel Semantic Smoothing Method for Text Classification 被引量:3
20
作者 Mitat Poyraz Zeynep Hilal Kilimci Murat Can Ganiz 《Journal of Computer Science & Technology》 SCIE EI CSCD 2014年第3期376-391,共16页
It is known that latent semantic indexing (LSI) takes advantage of implicit higher-order (or latent) structure in the association of terms and documents. Higher-order relations in LSI capture "latent semantics".... It is known that latent semantic indexing (LSI) takes advantage of implicit higher-order (or latent) structure in the association of terms and documents. Higher-order relations in LSI capture "latent semantics". These findings have inspired a novel Bayesian framework for classification named Higher-Order Naive Bayes (HONB), which was introduced previously, that can explicitly make use of these higher-order relations. In this paper, we present a novel semantic smoothing method named Higher-Order Smoothing (HOS) for the Naive Bayes algorithm. HOS is built on a similar graph based data representation of the HONB which allows semantics in higher-order paths to be exploited. We take the concept one step further in HOS and exploit the relationships between instances of different classes. As a result, we move beyond not only instance boundaries, but also class boundaries to exploit the latent information in higher-order paths. This approach improves the parameter estimation when dealing with insufficient labeled data. Results of our extensive experiments demonstrate the value of HOS oi1 several benchmark datasets. 展开更多
关键词 Naive Bayes semantic smoothing higher-order Naive Bayes higher-order smoothing text classification
原文传递
上一页 1 2 4 下一页 到第
使用帮助 返回顶部