期刊文献+
共找到786篇文章
< 1 2 40 >
每页显示 20 50 100
A Study on Short Text Matching Method Based on KS-BERT Algorithm
1
作者 YANG Hao-wen SUN Mei-feng 《印刷与数字媒体技术研究》 CAS 北大核心 2024年第5期164-173,共10页
To improve the accuracy of short text matching,a short text matching method with knowledge and structure enhancement for BERT(KS-BERT)was proposed in this study.This method first introduced external knowledge to the i... To improve the accuracy of short text matching,a short text matching method with knowledge and structure enhancement for BERT(KS-BERT)was proposed in this study.This method first introduced external knowledge to the input text,and then sent the expanded text to both the context encoder BERT and the structure encoder GAT to capture the contextual relationship features and structural features of the input text.Finally,the match was determined based on the fusion result of the two features.Experiment results based on the public datasets BQ_corpus and LCQMC showed that KS-BERT outperforms advanced models such as ERNIE 2.0.This Study showed that knowledge enhancement and structure enhancement are two effective ways to improve BERT in short text matching.In BQ_corpus,ACC was improved by 0.2%and 0.3%,respectively,while in LCQMC,ACC was improved by 0.4%and 0.9%,respectively. 展开更多
关键词 Deep learning short text matching Graph attention network Knowledge enhancement
下载PDF
Convolutional Deep Belief Network Based Short Text Classification on Arabic Corpus
2
作者 Abdelwahed Motwakel Badriyya B.Al-onazi +5 位作者 Jaber S.Alzahrani Radwa Marzouk Amira Sayed A.Aziz Abu Sarwar Zamani Ishfaq Yaseen Amgad Atta Abdelmageed1 《Computer Systems Science & Engineering》 SCIE EI 2023年第6期3097-3113,共17页
With a population of 440 million,Arabic language users form the rapidly growing language group on the web in terms of the number of Internet users.11 million monthly Twitter users were active and posted nearly 27.4 mi... With a population of 440 million,Arabic language users form the rapidly growing language group on the web in terms of the number of Internet users.11 million monthly Twitter users were active and posted nearly 27.4 million tweets every day.In order to develop a classification system for the Arabic lan-guage there comes a need of understanding the syntactic framework of the words thereby manipulating and representing the words for making their classification effective.In this view,this article introduces a Dolphin Swarm Optimization with Convolutional Deep Belief Network for Short Text Classification(DSOCDBN-STC)model on Arabic Corpus.The presented DSOCDBN-STC model majorly aims to classify Arabic short text in social media.The presented DSOCDBN-STC model encompasses preprocessing and word2vec word embedding at the preliminary stage.Besides,the DSOCDBN-STC model involves CDBN based classification model for Arabic short text.At last,the DSO technique can be exploited for optimal modification of the hyperparameters related to the CDBN method.To establish the enhanced performance of the DSOCDBN-STC model,a wide range of simulations have been performed.The simulation results con-firmed the supremacy of the DSOCDBN-STC model over existing models with improved accuracy of 99.26%. 展开更多
关键词 Arabic text short text classification dolphin swarm optimization deep learning
下载PDF
Effective short text classification via the fusion of hybrid features for IoT social data 被引量:3
3
作者 Xiong Luo Zhijian Yu +2 位作者 Zhigang Zhao Wenbing Zhao Jenq-Haur Wang 《Digital Communications and Networks》 SCIE CSCD 2022年第6期942-954,共13页
Nowadays short texts can be widely found in various social data in relation to the 5G-enabled Internet of Things (IoT). Short text classification is a challenging task due to its sparsity and the lack of context. Prev... Nowadays short texts can be widely found in various social data in relation to the 5G-enabled Internet of Things (IoT). Short text classification is a challenging task due to its sparsity and the lack of context. Previous studies mainly tackle these problems by enhancing the semantic information or the statistical information individually. However, the improvement achieved by a single type of information is limited, while fusing various information may help to improve the classification accuracy more effectively. To fuse various information for short text classification, this article proposes a feature fusion method that integrates the statistical feature and the comprehensive semantic feature together by using the weighting mechanism and deep learning models. In the proposed method, we apply Bidirectional Encoder Representations from Transformers (BERT) to generate word vectors on the sentence level automatically, and then obtain the statistical feature, the local semantic feature and the overall semantic feature using Term Frequency-Inverse Document Frequency (TF-IDF) weighting approach, Convolutional Neural Network (CNN) and Bidirectional Gate Recurrent Unit (BiGRU). Then, the fusion feature is accordingly obtained for classification. Experiments are conducted on five popular short text classification datasets and a 5G-enabled IoT social dataset and the results show that our proposed method effectively improves the classification performance. 展开更多
关键词 Information fusion short text classi fication BERT Bidirectional encoder representations fr 0om transformers Deep learning Social data
下载PDF
Sentiment Analysis of Short Texts Based on Parallel DenseNet 被引量:1
4
作者 Luqi Yan Jin Han +2 位作者 Yishi Yue Liu Zhang Yannan Qian 《Computers, Materials & Continua》 SCIE EI 2021年第10期51-65,共15页
Text sentiment analysis is a common problem in the field of natural language processing that is often resolved by using convolutional neural networks(CNNs).However,most of these CNN models focus only on learning local... Text sentiment analysis is a common problem in the field of natural language processing that is often resolved by using convolutional neural networks(CNNs).However,most of these CNN models focus only on learning local features while ignoring global features.In this paper,based on traditional densely connected convolutional networks(DenseNet),a parallel DenseNet is proposed to realize sentiment analysis of short texts.First,this paper proposes two novel feature extraction blocks that are based on DenseNet and a multiscale convolutional neural network.Second,this paper solves the problem of ignoring global features in traditional CNN models by combining the original features with features extracted by the parallel feature extraction block,and then sending the combined features into the final classifier.Last,a model based on parallel DenseNet that is capable of simultaneously learning both local and global features of short texts and shows better performance on six different databases compared to other basic models is proposed. 展开更多
关键词 Sentiment analysis short texts parallel DenseNet
下载PDF
Research of Collaborative Filtering Recommendation Algorithm for Short Text 被引量:2
5
作者 Chunxu Chao Shouning Qu Tao Du 《Journal of Computer and Communications》 2014年第14期59-66,共8页
Short text, based on the platform of web2.0, gained rapid development in a relatively short time. Recommendation systems analyzing user’s interest by short texts becomes more and more important. Collaborative filteri... Short text, based on the platform of web2.0, gained rapid development in a relatively short time. Recommendation systems analyzing user’s interest by short texts becomes more and more important. Collaborative filtering is one of the most promising recommendation technologies. However, the existing collaborative filtering methods don’t consider the drifting of user’s interest. This often leads to a big difference between the result of recommendation and user’s real demands. In this paper, according to the traditional collaborative filtering algorithm, a new personalized recommendation algorithm is proposed. It traced user’s interest by using Ebbinghaus Forgetting Curve. Some experiments have been done. The results demonstrated that the new algorithm could indeed make a contribution to getting rid of user’s overdue interests and discovering their real-time interests for more accurate recommendation. 展开更多
关键词 short text PERSONALIZED RECOMMENDATION Time WEIGHT FUNCTION
下载PDF
Short Text Classification Based on Improved ITC 被引量:1
6
作者 Liangliang Li Shouning Qu 《Journal of Computer and Communications》 2013年第4期22-27,共6页
The long text classification has got great achievements, but short text classification still needs to be perfected. In this paper, at first, we describe why we select the ITC feature selection algorithm not the conven... The long text classification has got great achievements, but short text classification still needs to be perfected. In this paper, at first, we describe why we select the ITC feature selection algorithm not the conventional TFIDF and the superiority of the ITC compared with the TFIDF, then we conclude the flaws of the conventional ITC algorithm, and then we present an improved ITC feature selection algorithm based on the characteristics of short text classification while combining the concepts of the Documents Distribution Entropy with the Position Distribution Weight. The improved ITC algorithm conforms to the actual situation of the short text classification. The experimental results show that the performance based on the new algorithm was much better than that based on the traditional TFIDF and ITC. 展开更多
关键词 ITC text CLASSIFICATION short text
下载PDF
Falcon: A Novel Chinese Short Text Classification Method
7
作者 Haiming Li Haining Huang +1 位作者 Xiang Cao Jingu Qian 《Journal of Computer and Communications》 2018年第11期216-226,共11页
For natural language processing problems, the short text classification is still a research hot topic, with obviously problem in the features sparse, high-dimensional text data and feature representation. In order to ... For natural language processing problems, the short text classification is still a research hot topic, with obviously problem in the features sparse, high-dimensional text data and feature representation. In order to express text directly, a simple but new variation which employs one-hot with low-dimension was proposed. In this paper, a Densenet-based model was proposed to short text classification. Furthermore, the feature diversity and reuse were implemented by the concat and average shuffle operation between Resnet and Densenet for enlarging short text feature selection. Finally, some benchmarks were introduced to evaluate the Falcon. From our experimental results, the Falcon method obtained significant improvements in the state-of-art models on most of them in all respects, especially in the first experiment of error rate. To sum up, the Falcon is an efficient and economical model, whilst requiring less computation to achieve high performance. 展开更多
关键词 short text Classification Word VECTOR Representation One-Hot Densenet NETWORKS Convolutional Neural NETWORKS
下载PDF
A Short Text Classification Model Based on Chinese Part-of-Speech Information and Mutual Learning
8
作者 Yihe Deng Zuxu Dai 《国际计算机前沿大会会议论文集》 EI 2023年第2期330-343,共14页
Short text classification is one of the common tasks in natural language processing.Short text contains less information,and there is still much room for improvement in the performance of short text classification model... Short text classification is one of the common tasks in natural language processing.Short text contains less information,and there is still much room for improvement in the performance of short text classification models.This paper proposes a new short text classification model ML-BERT based on the idea of mutual learning.ML-BERT includes a BERT that only uses word vector informa-tion and a BERT that fuses word information and part-of-speech information and introduces transmissionflag to control the information transfer between the two BERTs to simulate the mutual learning process between the two models.Experi-mental results show that the ML-BERT model obtains a MAF1 score of 93.79%on the THUCNews dataset.Compared with the representative models Text-CNN,Text-RNN and BERT,the MAF1 score improves by 8.11%,6.69%and 1.69%,respectively. 展开更多
关键词 Natural language processing Neural network Chinese short text classification BERT Mutual deep learning
原文传递
基于DAN与FastText的藏文短文本分类研究
9
作者 李果 陈晨 +1 位作者 杨进 群诺 《计算机科学》 CSCD 北大核心 2024年第S01期103-107,共5页
随着藏文信息不断融入社会生活,越来越多的藏文短文本数据存在网络平台上。针对传统分类方法在藏文短文本上分类性能低的问题,文中提出了一种基于DAN-FastText的藏文短文本分类模型。该模型使用FastText网络在较大规模的藏文语料上进行... 随着藏文信息不断融入社会生活,越来越多的藏文短文本数据存在网络平台上。针对传统分类方法在藏文短文本上分类性能低的问题,文中提出了一种基于DAN-FastText的藏文短文本分类模型。该模型使用FastText网络在较大规模的藏文语料上进行无监督训练获得预训练的藏文音节向量集,使用预训练的音节向量集将藏文短文本信息转化为音节向量,把音节向量送入DAN(Deep Averaging Networks)网络并在输出阶段融合经过FastText网络训练的句向量特征,最后通过全连接层和softmax层完成分类。在公开的TNCC(Tibetan News Classification Corpus)新闻标题数据集上所提模型的Macro-F1是64.53%,比目前最好评测结果TiBERT模型的Macro-F1得分高出2.81%,比GCN模型的Macro-F1得分高出6.14%,融合模型具有较好的藏文短文本分类效果。 展开更多
关键词 藏文短文本分类 特征融合 深度平均网络 快速文本
下载PDF
Short-Term Memory Capacity across Time and Language Estimated from Ancient and Modern Literary Texts. Study-Case: New Testament Translations
10
作者 Emilio Matricciani 《Open Journal of Statistics》 2023年第3期379-403,共25页
We study the short-term memory capacity of ancient readers of the original New Testament written in Greek, of its translations to Latin and to modern languages. To model it, we consider the number of words between any... We study the short-term memory capacity of ancient readers of the original New Testament written in Greek, of its translations to Latin and to modern languages. To model it, we consider the number of words between any two contiguous interpunctions I<sub>p</sub>, because this parameter can model how the human mind memorizes “chunks” of information. Since I<sub>P</sub> can be calculated for any alphabetical text, we can perform experiments—otherwise impossible— with ancient readers by studying the literary works they used to read. The “experiments” compare the I<sub>P</sub> of texts of a language/translation to those of another language/translation by measuring the minimum average probability of finding joint readers (those who can read both texts because of similar short-term memory capacity) and by defining an “overlap index”. We also define the population of universal readers, people who can read any New Testament text in any language. Future work is vast, with many research tracks, because alphabetical literatures are very large and allow many experiments, such as comparing authors, translations or even texts written by artificial intelligence tools. 展开更多
关键词 Alphabetical Languages Artificial Intelligence Writing GREEK LATIN New Testament Readers Overlap Probability short-Term Memory Capacity textS Translation Words Interval
下载PDF
Enriching short text representation in microblog for clustering 被引量:14
11
作者 Jiliang TANG Xufei WANG Huiji GAO Xia HU Huan LIU 《Frontiers of Computer Science》 SCIE EI CSCD 2012年第1期88-101,共14页
Social media websites allow users to exchange short texts such as tweets via microblogs and user status in friendship networks. Their limited length, pervasive abbrevi- ations, and coined acronyms and words exacerbate... Social media websites allow users to exchange short texts such as tweets via microblogs and user status in friendship networks. Their limited length, pervasive abbrevi- ations, and coined acronyms and words exacerbate the prob- lems of synonymy and polysemy, and bring about new chal- lenges to data mining applications such as text clustering and classification. To address these issues, we dissect some poten- tial causes and devise an efficient approach that enriches data representation by employing machine translation to increase the number of features from different languages. Then we propose a novel framework which performs multi-language knowledge integration and feature reduction simultaneously through matrix factorization techniques. The proposed ap- proach is evaluated extensively in terms of effectiveness on two social media datasets from Facebook and Twitter. With its significant performance improvement, we further investi- gate potential factors that contribute to the improved perfor- mance. 展开更多
关键词 short texts text representation multi-languageknowledge matrix factorization social media
原文传递
Short text classification based on strong feature thesaurus 被引量:7
12
作者 Bing-kun WANG Yong-feng HUANG +1 位作者 Wan-xia YANG Xing LI 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2012年第9期649-659,共11页
Data sparseness, the evident characteristic of short text, has always been regarded as the main cause of the low ac- curacy in the classification of short texts using statistical methods. Intensive research has been c... Data sparseness, the evident characteristic of short text, has always been regarded as the main cause of the low ac- curacy in the classification of short texts using statistical methods. Intensive research has been conducted in this area during the past decade. However, most researchers failed to notice that ignoring the semantic importance of certain feature terms might also contribute to low classification accuracy. In this paper we present a new method to tackle the problem by building a strong feature thesaurus (SFT) based on latent Dirichlet allocation (LDA) and information gain (IG) models. By giving larger weights to feature terms in SFT, the classification accuracy can be improved. Specifically, our method appeared to be more effective with more detailed classification. Experiments in two short text datasets demonstrate that our approach achieved improvement compared with the state-of-the-art methods including support vector machine (SVM) and Naive Bayes Multinomial. 展开更多
关键词 short text CLASSIFICATION Data sparseness SEMANTIC Strong feature thesaurus (SFT) Latent Dirichlet allocation(LDA)
原文传递
Short Text Mining Framework with Specific Design for Operation and Maintenance of Power Equipment 被引量:3
13
作者 Huifang Wang Ziquan Liu +2 位作者 Yongjin Xu Xiaoxiong Wei Lixin Wang 《CSEE Journal of Power and Energy Systems》 SCIE CSCD 2021年第6期1267-1277,共11页
In order to recover the value of short texts in the operation and maintenance of power equipment,a short text mining framework with specific design is proposed.First,the process of the short text mining framework is s... In order to recover the value of short texts in the operation and maintenance of power equipment,a short text mining framework with specific design is proposed.First,the process of the short text mining framework is summarized,in which the functions of all the processing modules are introduced.Then,according to the characteristics of short texts in the operation and maintenance of power equipment,the specific design for each module is proposed,which adapts the short text mining framework to a practical application.Finally,based on the framework with the specific designed modules,two examples in terms of defect texts are given to illustrate the application of short text mining in the operation and maintenance of power equipment.The results of the examples show that the short text mining framework is suitable for operation and maintenance tasks for power equipment,and the specific design for each module is beneficial for the improvement of the application effect. 展开更多
关键词 Machine learning natural language processing operation and maintenance power equipment short text mining
原文传递
A Short Text Classification Model for Electrical Equipment Defects Based on Contextual Features 被引量:1
14
作者 LI Peipei ZENG Guohui +5 位作者 HUANG Bo YIN Ling SHI Zhicai HE Chuanpeng LIU Wei CHEN Yu 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2022年第6期465-475,共11页
The defective information of substation equipment is usually recorded in the form of text. Due to the irregular spoken expressions of equipment inspectors, the defect information lacks sufficient contextual informatio... The defective information of substation equipment is usually recorded in the form of text. Due to the irregular spoken expressions of equipment inspectors, the defect information lacks sufficient contextual information and becomes more ambiguous.To solve the problem of sparse data deficient of semantic features in classification process, a short text classification model for defects in electrical equipment that fuses contextual features is proposed. The model uses bi-directional long-short term memory in short text classification to obtain the contextual semantics of short text data. Also, the attention mechanism is introduced to assign weights to different information in the context. Meanwhile, this model optimizes the convolutional neural network parameters with the help of the genetic algorithm for extracting salient features. According to the experimental results, the model can effectively realize the classification of power equipment defect text. In addition, the model was tested on an automotive parts repair dataset provided by the project partners, thus enabling the effective application of the method in specific industrial scenarios. 展开更多
关键词 short text classification genetic algorithm convolutional neural network attention mechanism
原文传递
PSLDA:a novel supervised pseudo document-based topic model for short texts
15
作者 Mingtao SUN Xiaowei ZHAO +3 位作者 Jingjing LIN Jian JING Deqing WANG Guozhu JIA 《Frontiers of Computer Science》 SCIE EI CSCD 2022年第6期71-80,共10页
Various kinds of online social media applications such as Twitter and Weibo,have brought a huge volume of short texts.However,mining semantic topics from short texts efficiently is still a challenging problem because ... Various kinds of online social media applications such as Twitter and Weibo,have brought a huge volume of short texts.However,mining semantic topics from short texts efficiently is still a challenging problem because of the sparseness of word-occurrence and the diversity of topics.To address the above problems,we propose a novel supervised pseudo-document-based maximum entropy discrimination latent Dirichlet allocation model(PSLDA for short).Specifically,we first assume that short texts are generated from the normal size latent pseudo documents,and the topic distributions are sampled from the pseudo documents.In this way,the model will reduce the sparseness of word-occurrence and the diversity of topics because it implicitly aggregates short texts to longer and higher-level pseudo documents.To make full use of labeled information in training data,we introduce labels into the model,and further propose a supervised topic model to learn the reasonable distribution of topics.Extensive experiments demonstrate that our proposed method achieves better performance compared with some state-of-the-art methods. 展开更多
关键词 supervised topic model short text pseudo-document
原文传递
Deep Neural Semantic Network for Keywords Extraction on Short Text
16
作者 Chundong She Huanying You +5 位作者 Changhai Lin Shaohua Liu Boxiang Liang Juan Jia Xinglei Zhang Yanming Qi 《国际计算机前沿大会会议论文集》 2020年第2期101-112,共12页
Keyword extraction is a branch of natural language processing,which plays an important role in many tasks,such as long text classification,automatic summary,machine translation,dialogue system,etc.All of them need to ... Keyword extraction is a branch of natural language processing,which plays an important role in many tasks,such as long text classification,automatic summary,machine translation,dialogue system,etc.All of them need to use high-quality keywords as a starting point.In this paper,we propose a deep learning network called deep neural semantic network(DNSN)to solve the problem of short text keyword extraction.It can map short text and words to the same semantic space,get the semantic vector of them at the same time,and then compute the similarity between short text and words to extract top-ranked words as keywords.The Bidirectional Encoder Representations from Transformers was first used to obtain the initial semantic feature vectors of short text and words,and then feed the initial semantic feature vectors to the residual network so as to obtain the final semantic vectors of short text and words at the same vector space.Finally,the keywords were extracted by calculating the similarity between short text and words.Compared with existed baseline models including Frequency,Term Frequency Inverse Document Frequency(TF-IDF)and Text-Rank,the model proposed is superior to the baseline models in Precision,Recall,and F-score on the same batch of test dataset.In addition,the precision,recall,and F-score are 6.79%,5.67%,and 11.08%higher than the baseline model in the best case,respectively. 展开更多
关键词 Semantic similarity Semantic network short text Keywords extraction
原文传递
基于字词向量融合的民航智慧监管短文本分类 被引量:1
17
作者 王欣 干镞锐 +2 位作者 许雅玺 史珂 郑涛 《中国安全科学学报》 CAS CSCD 北大核心 2024年第2期37-44,共8页
为解决民航监管事项所产生的检查记录仅依靠人工进行分类分析导致效率低的问题,提出一种基于数据增强与字词向量融合的双通道特征提取的短文本分类模型,探讨民航监管事项的分类,包括与人、设备设施环境、制度程序和机构职责等相关问题... 为解决民航监管事项所产生的检查记录仅依靠人工进行分类分析导致效率低的问题,提出一种基于数据增强与字词向量融合的双通道特征提取的短文本分类模型,探讨民航监管事项的分类,包括与人、设备设施环境、制度程序和机构职责等相关问题。为解决类别不平衡问题,采用数据增强算法在原始文本上进行变换,生成新的样本,使各个类别的样本数量更加均衡。将字向量和词向量按字融合拼接,得到具有词特征信息的字向量。将字词融合的向量分别送入到文本卷积神经网络(TextCNN)和双向长短期记忆(BiLSTM)模型中进行不同维度的特征提取,从局部的角度和全局的角度分别提取特征,并在民航监管事项检查记录数据集上进行试验。结果表明:该模型准确率为0.9837,F 1值为0.9836。与一些字嵌入模型和词嵌入模型相对比,准确率提升0.4%。和一些常用的单通道模型相比,准确率提升3%,验证了双通道模型提取的特征具有全面性和有效性。 展开更多
关键词 字词向量融合 民航监管 短文本 文本卷积神经网络(textCNN) 双向长短期记忆(BiLSTM)
下载PDF
金融科技赋能下供应链金融对企业价值的影响 被引量:2
18
作者 成程 杨胜刚 田轩 《管理科学学报》 CSSCI CSCD 北大核心 2024年第2期95-119,共25页
发展供应链金融对于深化金融供给侧结构性改革,增强金融服务实体经济具有不容忽视的重要战略意义.本文通过对2007年-2019年中国A股上市公司全部320.8万篇公告的文本数据信息进行收集整理,实证检验了上市公司发展供应链金融业务对企业价... 发展供应链金融对于深化金融供给侧结构性改革,增强金融服务实体经济具有不容忽视的重要战略意义.本文通过对2007年-2019年中国A股上市公司全部320.8万篇公告的文本数据信息进行收集整理,实证检验了上市公司发展供应链金融业务对企业价值的影响.研究结果表明:发展供应链金融在短期可以促进企业股价的上涨,在长期可以促进企业价值的提升,并且上市公司采用的供应链金融业务越多,公告中提到供应链金融词汇越频繁,这种提升效果越显著.发展供应链金融业务可以通过信号传递效应、风险承担效应、系统管理效应促进企业价值提高.在供应链金融相关公告中提到金融科技词汇的词频越高,上市公司获得的企业价值提升效果越强.金融科技可以为股票流动性较低、风险承担较弱的企业发展供应链金融业务起到更强的赋能效果.在进行一系列稳健性检验之后,上述结论依然成立.本文不仅为中国公司发展供应链金融业务提高企业价值,提供了基于公开市场数据大样本的实证证据,也对国家和地方政府支持供应链金融的发展具有现实启示。 展开更多
关键词 供应链金融 企业价值 文本分析 短期市场反应 金融科技
下载PDF
面向短文本的增强上下文神经主题模型
19
作者 刘刚 王同礼 +2 位作者 唐宏伟 战凯 杨雯莉 《计算机工程与应用》 CSCD 北大核心 2024年第1期154-164,共11页
目前的主题模型大多数基于自身文本的词共现信息进行建模,并没有引入主题的稀疏约束来提升模型的主题抽取能力,此外短文本本身存在词共现稀疏的问题,该问题严重影响了短文本主题建模的准确性。针对以上问题,提出了一种增强上下文神经主... 目前的主题模型大多数基于自身文本的词共现信息进行建模,并没有引入主题的稀疏约束来提升模型的主题抽取能力,此外短文本本身存在词共现稀疏的问题,该问题严重影响了短文本主题建模的准确性。针对以上问题,提出了一种增强上下文神经主题模型(enhanced context neural topic model,ECNTM)。ECNTM基于主题控制器对主题进行稀疏性约束,过滤掉不相关的主题,同时模型的输入变成BOW向量和SBERT句子嵌入的拼接,在高斯解码器中,通过在嵌入空间中将单词上的主题分布处理为多元高斯分布或高斯混合分布,显式地丰富了短文本有限的上下文信息,解决了短文本词共现特征稀疏问题。在WS、Reuters、KOS、20 NewsGroups四个公开数据集上的实验结果表明,该模型在困惑度、主题一致性以及文本分类准确率上相较基准模型均有明显提升,证明了引入主题稀疏约束特性以及丰富的上下文信息到短文本主题建模的有效性。 展开更多
关键词 神经主题模型 短文本 稀疏约束 变分自编码器 主题建模
下载PDF
基于词-主题-文本异质网络的短文本分类方法
20
作者 徐涛 赵星甲 卢敏 《计算机应用与软件》 北大核心 2024年第1期146-152,182,共8页
针对现有分类方法未考虑长距离词的语义相关性和文本间潜在主题共享的问题,提出一种基于词-主题-文本异质网络(WTDHN)的短文本分类方法。通过Word2vec训练词的上下文语义向量;构建词相关性矩阵以充足的词共现信息增强短文本各级别语义学... 针对现有分类方法未考虑长距离词的语义相关性和文本间潜在主题共享的问题,提出一种基于词-主题-文本异质网络(WTDHN)的短文本分类方法。通过Word2vec训练词的上下文语义向量;构建词相关性矩阵以充足的词共现信息增强短文本各级别语义学;构建以词、主题和文本为节点的异质网络,并采用图卷积学习节点之间的高阶邻域信息,丰富短文本语义。相较于基准分类模型,该方法在五个公开短文本数据集上的分类准确率平均提高1.56%。 展开更多
关键词 词-主题-文本异质网络 词共现 文本-主题分布 短文本分类
下载PDF
上一页 1 2 40 下一页 到第
使用帮助 返回顶部