期刊文献+
共找到73篇文章
< 1 2 4 >
每页显示 20 50 100
Leveraging Uncertainty for Depth-Aware Hierarchical Text Classification
1
作者 Zixuan Wu Ye Wang +2 位作者 Lifeng Shen Feng Hu Hong Yu 《Computers, Materials & Continua》 SCIE EI 2024年第9期4111-4127,共17页
Hierarchical Text Classification(HTC)aims to match text to hierarchical labels.Existing methods overlook two critical issues:first,some texts cannot be fully matched to leaf node labels and need to be classified to th... Hierarchical Text Classification(HTC)aims to match text to hierarchical labels.Existing methods overlook two critical issues:first,some texts cannot be fully matched to leaf node labels and need to be classified to the correct parent node instead of treating leaf nodes as the final classification target.Second,error propagation occurs when a misclassification at a parent node propagates down the hierarchy,ultimately leading to inaccurate predictions at the leaf nodes.To address these limitations,we propose an uncertainty-guided HTC depth-aware model called DepthMatch.Specifically,we design an early stopping strategy with uncertainty to identify incomplete matching between text and labels,classifying them into the corresponding parent node labels.This approach allows us to dynamically determine the classification depth by leveraging evidence to quantify and accumulate uncertainty.Experimental results show that the proposed DepthMatch outperforms recent strong baselines on four commonly used public datasets:WOS(Web of Science),RCV1-V2(Reuters Corpus Volume I),AAPD(Arxiv Academic Paper Dataset),and BGC.Notably,on the BGC dataset,it improvesMicro-F1 andMacro-F1 scores by at least 1.09%and 1.74%,respectively. 展开更多
关键词 hierarchical text classification incomplete text-label matching UNCERTAINTY depth-aware early stopping strategy
下载PDF
Long Text Classification Algorithm Using a Hybrid Model of Bidirectional Encoder Representation from Transformers-Hierarchical Attention Networks-Dilated Convolutions Network 被引量:1
2
作者 ZHAO Yuanyuan GAO Shining +1 位作者 LIU Yang GONG Xiaohui 《Journal of Donghua University(English Edition)》 CAS 2021年第4期341-350,共10页
Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid mo... Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid model of bidirectional encoder representation from transformers-hierarchical attention networks-dilated convolutions networks(BERT_HAN_DCN)which based on BERT pre-trained model with superior ability of extracting characteristic.The advantages of HAN model and DCN model are taken into account which can help gain abundant semantic information,fusing context semantic features and hierarchical characteristics.Secondly,the traditional softmax algorithm increases the learning difficulty of the same kind of samples,making it more difficult to distinguish similar features.Based on this,AM-softmax is introduced to replace the traditional softmax.Finally,the fused model is validated,which shows superior performance in the accuracy rate and F1-score of this hybrid model on two datasets and the experimental analysis shows the general single models such as HAN,DCN,based on BERT pre-trained model.Besides,the improved AM-softmax network model is superior to the general softmax network model. 展开更多
关键词 long text classification dilated convolution BERT fusing context semantic features hierarchical characteristics BERT_HAN_DCN AM-softmax
下载PDF
Concept Association and Hierarchical Hamming Clustering Model in Text Classification
3
作者 SuGui-yang LiJian-hua MaYing-hua LiSheng-hong YinZhong-hang 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第3期339-342,共4页
We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to r... We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to reduce the dimensionality of the category feature vector space which can solve the problem of the extremely high dimensionality of the documents' feature space. The results of experiment indicate that it can obtain the co-occurrence relations among key-words in the documents which promote the recall of classification system effectively. The hierarchical Hamming clustering model can reduce the dimensionality of the category feature vector efficiently, the size of the vector space is only about 10% of the primary dimensionality. Key words text classification - concept association - hierarchical clustering - hamming clustering CLC number TN 915. 08 Foundation item: Supporteded by the National 863 Project of China (2001AA142160, 2002AA145090)Biography: Su Gui-yang (1974-), male, Ph. D candidate, research direction: information filter and text classification. 展开更多
关键词 text classification concept association hierarchical clustering hamming clustering
下载PDF
Performance evaluation of seven multi-label classification methods on real-world patent and publication datasets
4
作者 Shuo Xu Yuefu Zhang +1 位作者 Xin An Sainan Pi 《Journal of Data and Information Science》 CSCD 2024年第2期81-103,共23页
Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on t... Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on the benchmark datasets have been proposed for multi-label classification task in the literature.Furthermore,several open-source tools implementing these approaches have also been developed.However,the characteristics of real-world multi-label patent and publication datasets are not completely in line with those of benchmark ones.Therefore,the main purpose of this paper is to evaluate comprehensively seven multi-label classification methods on real-world datasets.Research limitations:Three real-world datasets differ in the following aspects:statement,data quality,and purposes.Additionally,open-source tools designed for multi-label classification also have intrinsic differences in their approaches for data processing and feature selection,which in turn impacts the performance of a multi-label classification approach.In the near future,we will enhance experimental precision and reinforce the validity of conclusions by employing more rigorous control over variables through introducing expanded parameter settings.Practical implications:The observed Macro F1 and Micro F1 scores on real-world datasets typically fall short of those achieved on benchmark datasets,underscoring the complexity of real-world multi-label classification tasks.Approaches leveraging deep learning techniques offer promising solutions by accommodating the hierarchical relationships and interdependencies among labels.With ongoing enhancements in deep learning algorithms and large-scale models,it is expected that the efficacy of multi-label classification tasks will be significantly improved,reaching a level of practical utility in the foreseeable future.Originality/value:(1)Seven multi-label classification methods are comprehensively compared on three real-world datasets.(2)The TextCNN and TextRCNN models perform better on small-scale datasets with more complex hierarchical structure of labels and more balanced document-label distribution.(3)The MLkNN method works better on the larger-scale dataset with more unbalanced document-label distribution. 展开更多
关键词 multi-label classification Real-World datasets hierarchical structure classification system Label correlation Machine learning
下载PDF
Hierarchical Classification of Chinese Documents Based on N grams 被引量:1
5
作者 Zhou Shui geng 1, Guan Ji hong 2, He Yan xiang 2 1. State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, China 2. School of Computer Science, Wuhan University, Wuhan 430072, China 《Wuhan University Journal of Natural Sciences》 CAS 2001年第Z1期416-422,共7页
We explore the techniques of utilizing N gram information to categorize Chinese text documents hierarchically so that the classifier can shake off the burden of large dictionaries and complex segmentation process... We explore the techniques of utilizing N gram information to categorize Chinese text documents hierarchically so that the classifier can shake off the burden of large dictionaries and complex segmentation processing, and subsequently be domain and time independent. A hierarchical Chinese text classifier is implemented. Experimental results show that hierarchically classifying Chinese text documents based N grams can achieve satisfactory performance and outperforms the other traditional Chinese text classifiers. 展开更多
关键词 Chinese text classification N grams feature selection hierarchical classification
下载PDF
Gate-Attention and Dual-End Enhancement Mechanism for Multi-Label Text Classification
6
作者 Jieren Cheng Xiaolong Chen +3 位作者 Wenghang Xu Shuai Hua Zhu Tang Victor S.Sheng 《Computers, Materials & Continua》 SCIE EI 2023年第11期1779-1793,共15页
In the realm of Multi-Label Text Classification(MLTC),the dual challenges of extracting rich semantic features from text and discerning inter-label relationships have spurred innovative approaches.Many studies in sema... In the realm of Multi-Label Text Classification(MLTC),the dual challenges of extracting rich semantic features from text and discerning inter-label relationships have spurred innovative approaches.Many studies in semantic feature extraction have turned to external knowledge to augment the model’s grasp of textual content,often overlooking intrinsic textual cues such as label statistical features.In contrast,these endogenous insights naturally align with the classification task.In our paper,to complement this focus on intrinsic knowledge,we introduce a novel Gate-Attention mechanism.This mechanism adeptly integrates statistical features from the text itself into the semantic fabric,enhancing the model’s capacity to understand and represent the data.Additionally,to address the intricate task of mining label correlations,we propose a Dual-end enhancement mechanism.This mechanism effectively mitigates the challenges of information loss and erroneous transmission inherent in traditional long short term memory propagation.We conducted an extensive battery of experiments on the AAPD and RCV1-2 datasets.These experiments serve the dual purpose of confirming the efficacy of both the Gate-Attention mechanism and the Dual-end enhancement mechanism.Our final model unequivocally outperforms the baseline model,attesting to its robustness.These findings emphatically underscore the imperativeness of taking into account not just external knowledge but also the inherent intricacies of textual data when crafting potent MLTC models. 展开更多
关键词 multi-label text classification feature extraction label distribution information sequence generation
下载PDF
A Hierarchical Two-Level Feature Fusion Approach for SMS Spam Filtering
7
作者 Hussein Alaa Al-Kabbi Mohammad-Reza Feizi-Derakhshi Saeed Pashazadeh 《Intelligent Automation & Soft Computing》 2024年第4期665-682,共18页
SMS spam poses a significant challenge to maintaining user privacy and security.Recently,spammers have employed fraudulent writing styles to bypass spam detection systems.This paper introduces a novel two-level detect... SMS spam poses a significant challenge to maintaining user privacy and security.Recently,spammers have employed fraudulent writing styles to bypass spam detection systems.This paper introduces a novel two-level detection system that utilizes deep learning techniques for effective spam identification to address the challenge of sophisticated SMS spam.The system comprises five steps,beginning with the preprocessing of SMS data.RoBERTa word embedding is then applied to convert text into a numerical format for deep learning analysis.Feature extraction is performed using a Convolutional Neural Network(CNN)for word-level analysis and a Bidirectional Long Short-Term Memory(BiLSTM)for sentence-level analysis.The two-level feature extraction enables a complete understanding of individual words and sentence structure.The novel part of the proposed approach is the Hierarchical Attention Network(HAN),which fuses and selects features at two levels through an attention mechanism.The HAN can deal with words and sentences to focus on the most pertinent aspects of messages for spam detection.This network is productive in capturing meaningful features,considering both word-level and sentence-level semantics.In the classification step,the model classifies the messages into spam and ham.This hybrid deep learning method improve the feature representation,and enhancing the model’s spam detection capabilities.By significantly reducing the incidence of SMS spam,our model contributes to a safer mobile communication environment,protecting users against potential phishing attacks and scams,and aiding in compliance with privacy and security regulations.This model’s performance was evaluated using the SMS Spam Collection Dataset from the UCI Machine Learning Repository.Cross-validation is employed to consider the dataset’s imbalanced nature,ensuring a reliable evaluation.The proposed model achieved a good accuracy of 99.48%,underscoring its efficiency in identifying SMS spam. 展开更多
关键词 SMS spam detection hierarchical attention network text classification natural language processing
下载PDF
基于改进分层注意网络和TextCNN联合建模的暴力犯罪分级算法
8
作者 张家伟 高冠东 +1 位作者 肖珂 宋胜尊 《计算机应用》 CSCD 北大核心 2024年第2期403-410,共8页
为了科学、智能地对服刑人员的暴力倾向分级,将自然语言处理(NLP)中的文本分类方法引入犯罪心理学领域,提出一种基于改进分层注意网络(HAN)与TextCNN(Text Convolutional Neural Network)两通道联合建模的犯罪语义卷积分层注意网络(CCHA... 为了科学、智能地对服刑人员的暴力倾向分级,将自然语言处理(NLP)中的文本分类方法引入犯罪心理学领域,提出一种基于改进分层注意网络(HAN)与TextCNN(Text Convolutional Neural Network)两通道联合建模的犯罪语义卷积分层注意网络(CCHA-Net),通过分别挖掘犯罪事实与服刑人员基本情况的语义信息,完成暴力犯罪气质分级。首先,采用Focal Loss同时替代两通道中的Cross-Entropy函数,优化样本数量不均衡问题。其次,在两通道输入层中,同时引入位置编码,改进对位置信息的感知能力;改进HAN通道,采用最大池化构建显著向量。最后,输出层都采用全局平均池化替代全连接方法,以避免过拟合。实验结果表明,与AC-BiLSTM(Attention-based Bidirectional Long Short-Term Memory with Convolution layer)、支持向量机(SVM)等17种相关基线模型相比,CCHA-Net各项指标均最优,微平均F1(Micro_F1)为99.57%,宏平均和微平均下的曲线下面积(AUC)分别为99.45%和99.89%,相较于次优的AC-BiLSTM提高了4.08、5.59和0.74个百分点,验证了CCHA-Net能有效胜任暴力犯罪气质分级任务。 展开更多
关键词 深度学习 文本分类 卷积神经网络 分层注意网络 暴力犯罪分级 气质类型
下载PDF
Multi-granularity sequence generation for hierarchical image classification
9
作者 Xinda Liu Lili Wang 《Computational Visual Media》 SCIE EI CSCD 2024年第2期243-260,共18页
Hierarchical multi-granularity image classification is a challenging task that aims to tag each given image with multiple granularity labels simultaneously.Existing methods tend to overlook that different image region... Hierarchical multi-granularity image classification is a challenging task that aims to tag each given image with multiple granularity labels simultaneously.Existing methods tend to overlook that different image regions contribute differently to label prediction at different granularities,and also insufficiently consider relationships between the hierarchical multi-granularity labels.We introduce a sequence-to-sequence mechanism to overcome these two problems and propose a multi-granularity sequence generation(MGSG)approach for the hierarchical multi-granularity image classification task.Specifically,we introduce a transformer architecture to encode the image into visual representation sequences.Next,we traverse the taxonomic tree and organize the multi-granularity labels into sequences,and vectorize them and add positional information.The proposed multi-granularity sequence generation method builds a decoder that takes visual representation sequences and semantic label embedding as inputs,and outputs the predicted multi-granularity label sequence.The decoder models dependencies and correlations between multi-granularity labels through a masked multi-head self-attention mechanism,and relates visual information to the semantic label information through a crossmodality attention mechanism.In this way,the proposed method preserves the relationships between labels at different granularity levels and takes into account the influence of different image regions on labels with different granularities.Evaluations on six public benchmarks qualitatively and quantitatively demonstrate the advantages of the proposed method.Our project is available at https://github.com/liuxindazz/mgs. 展开更多
关键词 hierarchical multi-granularity classification vision and text transformer sequence generation fine-grained image recognition cross-modality attenti
原文传递
基于多尺度特征提取的层次多标签文本分类方法
10
作者 武子轩 王烨 于洪 《郑州大学学报(理学版)》 CAS 北大核心 2025年第2期24-30,共7页
针对现有的特征提取方法忽略文本局部和全局联系的问题,提出了基于多尺度特征提取的层次多标签文本分类方法。首先,设计了多尺度特征提取模块,对不同尺度特征进行捕捉,更好地表示文本语义。其次,将层次特征嵌入文本表示中,得到具有标签... 针对现有的特征提取方法忽略文本局部和全局联系的问题,提出了基于多尺度特征提取的层次多标签文本分类方法。首先,设计了多尺度特征提取模块,对不同尺度特征进行捕捉,更好地表示文本语义。其次,将层次特征嵌入文本表示中,得到具有标签特征的文本语义表示。最后,在标签层次结构的指导下对输入文本构建正负样本,进行对比学习,提高分类效果。在WOS、RCV1-V2、NYT和AAPD数据集上进行对比实验,结果表明,所提模型在评价指标上表现出色,超过了其他主流模型。此外,针对层次分类提出层次Micro-F 1和层次Macro-F 1指标,并对模型效果进行了评价。 展开更多
关键词 层次多标签文本分类 多尺度特征提取 对比学习 层次Micro-F 1 层次Macro-F 1
下载PDF
Multi-label text classification model based on semantic embedding 被引量:3
11
作者 Yan Danfeng Ke Nan +2 位作者 Gu Chao Cui Jianfei Ding Yiqi 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2019年第1期95-104,共10页
Text classification means to assign a document to one or more classes or categories according to content. Text classification provides convenience for users to obtain data. Because of the polysemy of text data, multi-... Text classification means to assign a document to one or more classes or categories according to content. Text classification provides convenience for users to obtain data. Because of the polysemy of text data, multi-label classification can handle text data more comprehensively. Multi-label text classification become the key problem in the data mining. To improve the performances of multi-label text classification, semantic analysis is embedded into the classification model to complete label correlation analysis, and the structure, objective function and optimization strategy of this model is designed. Then, the convolution neural network(CNN) model based on semantic embedding is introduced. In the end, Zhihu dataset is used for evaluation. The result shows that this model outperforms the related work in terms of recall and area under curve(AUC) metrics. 展开更多
关键词 multi-label text classification CONVOLUTION NEURAL network SEMANTIC analysis
原文传递
Multi-Label Chinese Comments Categorization: Comparison of Multi-Label Learning Algorithms 被引量:4
12
作者 Jiahui He Chaozhi Wang +2 位作者 Hongyu Wu Leiming Yan Christian Lu 《Journal of New Media》 2019年第2期51-61,共11页
Multi-label text categorization refers to the problem of categorizing text througha multi-label learning algorithm. Text classification for Asian languages such as Chinese isdifferent from work for other languages suc... Multi-label text categorization refers to the problem of categorizing text througha multi-label learning algorithm. Text classification for Asian languages such as Chinese isdifferent from work for other languages such as English which use spaces to separate words.Before classifying text, it is necessary to perform a word segmentation operation to converta continuous language into a list of separate words and then convert it into a vector of acertain dimension. Generally, multi-label learning algorithms can be divided into twocategories, problem transformation methods and adapted algorithms. This work will usecustomer's comments about some hotels as a training data set, which contains labels for allaspects of the hotel evaluation, aiming to analyze and compare the performance of variousmulti-label learning algorithms on Chinese text classification. The experiment involves threebasic methods of problem transformation methods: Support Vector Machine, Random Forest,k-Nearest-Neighbor;and one adapted algorithm of Convolutional Neural Network. Theexperimental results show that the Support Vector Machine has better performance. 展开更多
关键词 multi-label classification Chinese text classification problem transformation adapted algorithms
下载PDF
Text GCN-SW-KNN:a novel collaborative training multi-label classification method for WMS application themes by considering geographic semantics 被引量:1
13
作者 Zhengyang Wei Zhipeng Gui +5 位作者 Min Zhang Zelong Yang Yuao Mei Huayi Wu Hongbo Liu Jing Yu 《Big Earth Data》 EI 2021年第1期66-89,共24页
Without explicit description of map application themes,it is difficult for users to discover desired map resources from massive online Web Map Services(WMS).However,metadata-based map application theme extraction is a... Without explicit description of map application themes,it is difficult for users to discover desired map resources from massive online Web Map Services(WMS).However,metadata-based map application theme extraction is a challenging multi-label text classification task due to limited training samples,mixed vocabularies,variable length and content arbitrariness of text fields.In this paper,we propose a novel multi-label text classification method,Text GCN-SW-KNN,based on geographic semantics and collaborative training to improve classifica-tion accuracy.The semi-supervised collaborative training adopts two base models,i.e.a modified Text Graph Convolutional Network(Text GCN)by utilizing Semantic Web,named Text GCN-SW,and widely-used Multi-Label K-Nearest Neighbor(ML-KNN).Text GCN-SW is improved from Text GCN by adjusting the adjacency matrix of the heterogeneous word document graph with the shortest semantic distances between themes and words in metadata text.The distances are calculated with the Semantic Web of Earth and Environmental Terminology(SWEET)and WordNet dictionaries.Experiments on both the WMS and layer metadata show that the proposed methods can achieve higher F1-score and accuracy than state-of-the-art baselines,and demonstrate better stability in repeating experiments and robustness to less training data.Text GCN-SW-KNN can be extended to other multi-label text classification scenario for better supporting metadata enhancement and geospatial resource discovery in Earth Science domain. 展开更多
关键词 Web map service multi-label text classification semantic distance text graph convolutional network collaborative training MLKNN application theme extraction
原文传递
融合对比学习和BERT的层级多标签文本分类模型
14
作者 代林林 张超群 +2 位作者 汤卫东 刘成星 张龙昊 《计算机工程与设计》 北大核心 2024年第10期3111-3119,共9页
为有效解决现有文本分类模型难以建模标签语义关系的问题,提出一种融合对比学习和自注意力机制的层级多标签文本分类模型,命名为SampleHCT。设计一个标签特征提取模块,能有效提取标签的语义和层次结构特征。采用自注意力机制构建具有混... 为有效解决现有文本分类模型难以建模标签语义关系的问题,提出一种融合对比学习和自注意力机制的层级多标签文本分类模型,命名为SampleHCT。设计一个标签特征提取模块,能有效提取标签的语义和层次结构特征。采用自注意力机制构建具有混合标签信息的阳性样本。使用对比学习训练文本编码器的标签意识。实验结果表明,SampleHCT相较于19个基准模型,取得了更高的分类分数,验证了其具有更有效的标签信息建模方式。 展开更多
关键词 文本分类 对比学习 自注意力机制 层级结构 多标签 标签信息 全局特征
下载PDF
基于《中国图书馆分类法》的文献自动化深层分类的研究和实现 被引量:1
15
作者 张雨卉 《图书馆杂志》 CSSCI 北大核心 2024年第3期61-74,共14页
基于《中国图书馆分类法》(下简称《中图法》)的文献深层分类蕴含着两个经典的自然语言处理问题:极限多标签文本分类(Extreme Multi-label Text Classification,XMC)和层次文本分类(Hierarchical Text Classification,HTC)。然而目前基... 基于《中国图书馆分类法》(下简称《中图法》)的文献深层分类蕴含着两个经典的自然语言处理问题:极限多标签文本分类(Extreme Multi-label Text Classification,XMC)和层次文本分类(Hierarchical Text Classification,HTC)。然而目前基于《中图法》的文献分类研究普遍将其视为普通的文本分类问题,由于没有充分挖掘问题的核心特点,这些研究在深层分类上的效果普遍不理想甚至不可行。相较于同类研究,本文基于对《中图法》文献分类特点和难点的深入分析,从XMC和HTC两个角度对基于《中图法》的文献深层分类和相关的解决方案进行了考察和研究,并针对该场景下的特点进行应用和创新,不仅提高了分类的准确度,还扩展了分类的深度和广度。本文模型首先通过适用于XMC问题的轻量深度学习模型提取了文本的语义特征作为分类的基础依据,而后针对《中图法》分类中的HTC问题,利用LTR(Learning to Rank)框架融入包括层级结构信息等多元特征作为分类的辅助依据,极大化地挖掘了蕴含在文本语义及分类体系中的信息和知识。本模型兼具深度学习模型强大的语义理解能力与机器学习模型的可解释性,同时具备良好的可扩展性,后期可较为便捷地融入专家定制的新特征进行提高,并且模型较为轻量,可在有限计算资源下轻松应对数万级别的分类标签,为基于《中图法》的全深度分类奠定良好的基础。 展开更多
关键词 极限多标签文本分类 层次文本分类 深度学习 《中国图书馆分类法》
下载PDF
融合语义解释和DeBERTa的极短文本层次分类
16
作者 陈昊飏 张雷 《计算机科学》 CSCD 北大核心 2024年第5期250-257,共8页
文本层次分类在社交评论主题分类、搜索词分类等场景中有重要应用,这些场景的数据往往具有极短文本特征,体现在信息的稀疏性、敏感性等中,这对模型特征表示和分类性能带来了很大挑战,而层次标签空间的复杂性和关联性使得难度进一步加剧... 文本层次分类在社交评论主题分类、搜索词分类等场景中有重要应用,这些场景的数据往往具有极短文本特征,体现在信息的稀疏性、敏感性等中,这对模型特征表示和分类性能带来了很大挑战,而层次标签空间的复杂性和关联性使得难度进一步加剧。基于此,提出了一种融合语义解释和DeBERTa模型的方法,该方法的核心思想在于:引入具体语境下各个字词或词组的语义解释,补充优化模型获取的内容信息;结合DeBERTa模型的注意力解耦机制与增强掩码解码器,以更好地把握位置信息、提高特征提取能力。所提方法首先对训练文本进行语法分词、词性标注,再构造GlossDeBERTa模型进行高准确率的语义消歧,获得语义解释序列;然后利用SimCSE框架使解释序列向量化,以更好地表征解释序列中的句子信息;最后训练文本经过DeBERTa模型神经网络后,得到原始文本的特征向量表示,再与解释序列中的对应特征向量相加,传入多分类器。实验遴选短文本层次分类数据集TREC中的极短文本部分,并进行数据扩充,最终得到的数据集平均长度为12词。多组对比实验表明,所提出的融合语义解释的DeBERTa模型性能最为优秀,在验证集和测试集上的Accuracy值、F1-micro值、F1-macro值相比其他算法模型有较大的提升,能够很好地应对极短文本层次分类任务。 展开更多
关键词 极短文本 层次分类 语义解释 DeBERTa GlossDeBERTa SimCSE
下载PDF
学术论文学科领域层次标签分类方法
17
作者 贾启龙 张仰森 +2 位作者 刘帅康 朱思文 高强 《北京信息科技大学学报(自然科学版)》 2024年第1期42-48,54,共8页
针对学术论文在学科领域内进行层次标签分类问题,提出了一种基于知识增强的语义表示与图注意力网络的文本层次标签分类(text hierarchical label classification based on enhanced representation through knowledge integration and g... 针对学术论文在学科领域内进行层次标签分类问题,提出了一种基于知识增强的语义表示与图注意力网络的文本层次标签分类(text hierarchical label classification based on enhanced representation through knowledge integration and graph attention networks, GETHLC)模型。首先,通过层次标签抽取模块提取学科领域下层次标签的结构特征,并通过预训练模型对学术论文的摘要、标题和抽取后的层次标签结构特征进行嵌入;然后,在分类阶段基于层次标签的结构分层构造层次分类器,将学术论文逐层分类至最符合的类别中。在大规模中文科学文献数据集CSL上进行的实验结果表明,与基准的ERNIE模型相比,GETHLC模型的准确率、召回率和F1值分别提升了5.78、4.31和5.02百分点。 展开更多
关键词 层次标签 文本分类 图注意力机制 知识增强的语义表示 预训练
下载PDF
基于K-BERT-LDA的层级多标签招标标段分类方法
18
作者 侯继辉 吴小忠 +4 位作者 刘晖 夏卓群 梁涤青 邱涵 徐嘉慧 《软件导刊》 2024年第12期66-74,共9页
传统人工招标分标效率准确率低,针对语义特征稀疏且标签具有明显层级结构特点的物资招标文本,提出了一种基于K-BERT-LDA的层级多标签文本分类方法。首先,通过混合模型提取文本特征,K-BERT模型提取具有知识注入的文本特征以弥补语义信息... 传统人工招标分标效率准确率低,针对语义特征稀疏且标签具有明显层级结构特点的物资招标文本,提出了一种基于K-BERT-LDA的层级多标签文本分类方法。首先,通过混合模型提取文本特征,K-BERT模型提取具有知识注入的文本特征以弥补语义信息缺失,LDA主题模型提取主题分布特征,并通过特征融合进一步丰富文本特征表示。其次,联合嵌入类别标签,即上层标签预测结果能引导下层分类,并充分利用标签间的树形结构关系提升多标签文本分类准确性。最后,提出一种基于文本相似度算法的智能处理策略,通过合并预投资金额不足的标段以保障招标成功率并得到分标结果。实验表明,所提方法相较于其他分类方法及单一模型而言分类性能更好,准确率、精确度和F1值分别达到95.45%、92.57%和91.88%,能高效、准确地实现智能分标目的。 展开更多
关键词 招标分标 层级多标签文本分类 知识注入 主题分布 特征融合 文本相似度
下载PDF
深层次标签辅助分类任务的层次标签文本分类方法
19
作者 曹渝昆 魏子越 +2 位作者 唐艺嘉 金成坤 李云峰 《计算机工程与应用》 CSCD 北大核心 2024年第10期105-112,共8页
层次标签文本分类是自然语言处理领域中一项具有挑战性的任务,每个文档需要被正确分类到对应具有层次结构的多个标签中。然而在标签集中,由于标签包含的语义信息不充分,同时被归类到深层次标签的文档数量过少,深层次标签训练不充分,导... 层次标签文本分类是自然语言处理领域中一项具有挑战性的任务,每个文档需要被正确分类到对应具有层次结构的多个标签中。然而在标签集中,由于标签包含的语义信息不充分,同时被归类到深层次标签的文档数量过少,深层次标签训练不充分,导致显著的标签训练不平衡问题。基于此,提出了深层次标签辅助分类任务的层次标签文本分类方法(DLAC)。该方法提出了一种深层次标签辅助分类器,在标签语义增强的基础上有效利用文本特征与深层次标签对应的父标签结点(即浅层次标签的丰富特征)来提升深层次标签的分类性能。与11种算法在三个数据集上的对比实验结果表明,模型能够有效提升深层次标签的分类性能,并取得良好效果。 展开更多
关键词 层次标签文本分类 标签层次结构 全局标签分类通道 深层次标签辅助分类通道
下载PDF
层次多标签文本分类方法 被引量:9
20
作者 赵海燕 曹杰 +1 位作者 陈庆奎 曹健 《小型微型计算机系统》 CSCD 北大核心 2022年第4期673-683,共11页
现实世界的大量应用,比如文档归类、网页分类、专利分类等,其类别信息(标签)是一个具有层次关系的体系,对它们进行自动分类涉及到在此层次标签体系中选择多个正确的标签,因此形成了一类层次多标签文本分类问题.如何学习和利用这些不同... 现实世界的大量应用,比如文档归类、网页分类、专利分类等,其类别信息(标签)是一个具有层次关系的体系,对它们进行自动分类涉及到在此层次标签体系中选择多个正确的标签,因此形成了一类层次多标签文本分类问题.如何学习和利用这些不同层级的关系、并对分类结果从层级关系遵循性的角度进行评价成为层次多标签分类问题的难点和挑战.本文对层次多标签文本分类的研究现状进行了系统化的总结.目前的方法从是否使用层次结构可以分为平面方法和层次方法,而层次方法又可以分为局部方法、全局方法和混合方法.这些方法包含了使用不同技术的多种模型.文中还分析了层次多标签文本分类任务的挑战和难点,并对本领域未来的研究方向进行了展望. 展开更多
关键词 层次多标签 文本分类 层次结构 文本表示
下载PDF
上一页 1 2 4 下一页 到第
使用帮助 返回顶部