期刊文献+

聚类标注和多粒度特征融合的基金新闻分类

Fund News Classification Based on Clustering Annotation and Multi Granularity Feature Fusion
下载PDF
导出
摘要 针对人工标注类别耗时耗力、效率低,以及现有文本分类方法忽略词语、句子之间关系,未对文本分类关键特征赋予更高权重等问题,提出了一种基于聚类加权标注和多粒度特征融合的基金新闻分类方法.基于聚类加权的类别标注算法将K-Means和DBSCAN的聚类结果进行加权计算并自动标注基金文本数据,辅以少量人工校对,为后续基金新闻分类提供数据支撑.多粒度特征融合的分类算法首先从词粒度出发构建停用词表、扩展词典;其次从句粒度出发抽取新闻摘要,捕捉更具有语义关联的文本信息;最后将多头注意力机制嵌入BERT模型,对关键特征赋予更高权重,以提高分类的准确性.本文从多个角度进行了充分地实验,该方法具有高效的处理能力和有效性,其分类精确率可达到95.21%,优于现有方法. This paper proposes a fund news classification method based on cluster-weighted labeling and multi-granularity feature fusion,aiming to address issues such as time-consuming and laborious manual category labeling and the neglect of word and sentence relationships in existing text classification methods,as well as the lower weight of key features in text classification.The category labeling algorithm based on cluster-weighted labeling calculates and automatically labels fund text data based on the weighted results of K-Means and DBSCAN clustering,which is supported by a small amount of manual correction to facilitate fund news classification.The multi-granularity feature fusion classification algorithm first constructs a stop word table and an extended dictionary from the word granularity and then extracts news summaries from the sentence granularity to capture more semantically related text information.Then,multi-head attention mechanism is integrated into BERT model to assign higher weights to key features,thus improving the accuracy of classification.The proposed method is rigorously tested from multiple aspects and demonstrates high efficiency and validity,achieving a classification precision of 95.21%,outperforming existing methods.
作者 胡菊香 吕学强 游新冬 周建设 HU Juxiang;L Xueqiang;YOU Xindong;ZHOU Jianshe(Research Center for Language Intelligence of China,Capital Normal University,Beijing 100048,China;Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100101,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2024年第2期257-264,共8页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(62171043)资助 北京市自然科学基金项目(4212020)资助 国家语委项目(ZDI145-10,YB145-3)资助 北京市教育委员会科学研究计划项目(KM202111232001)资助。
关键词 多粒度 特征融合 文本分类 深度学习 multi-granularity feature fusion text classification deep learning
  • 相关文献

参考文献11

二级参考文献33

  • 1肖宇,于剑.Gap statistic与K-means算法[J].计算机研究与发展,2007,44(z2):176-180. 被引量:7
  • 2YANG A,ZHOU Y,LIN J.A method of Chinese texts sentiment classification based on Bayesian algorithm [J].Applied Mechanics and Materials,2012,263/264/265/266:2185-2190.
  • 3YANG A,LIN J,ZHOU Y,et al.Research on building a Chinese sentiment lexicon based on SO-PMI [J].Applied Mechanics and Materials,2012,263/264/265/266:1688-1693.
  • 4中国计算机学会.评测测试数据[EB/OL].[2013-12-10].http://tcci.ccf.org.cn/conference/2013/pages/page04_tdata.html.
  • 5大连理工大学信息检索研究室.情感词汇本体库[EB/OL].[2014-01-18].http://ir.dlut.edu.cn/EmotionOntologyDownload.aspx?utm_source=weibolife.
  • 6姜飞,张辉,刘奕群,等.THUIR-Senti中文微博情绪分析评测报告[EB/OL].[2013-12-02].http://tcci.ccf.org.cn/conference/2013/dldoc/evrpt02.rar.
  • 7孙晓,叶嘉琪,唐诚意,等.基于多粒度模型的中文微博情感分析[EB/OL].[2013-12-02].http://tcci.ccf.org.cn/conference/2013/dldoc/evrpt02.rar.
  • 8徐琳宏,林鸿飞,赵晶.情感语料库的构建和分析[J].中文信息学报,2008,22(1):116-122. 被引量:110
  • 9黄贤立.基于典型相关分析的多视图跨领域情感分类[J].计算机工程,2010,36(24):186-188. 被引量:6
  • 10徐国祥,杨振建.PCA-GA-SVM模型的构建及应用研究——沪深300指数预测精度实证分析[J].数量经济技术经济研究,2011,28(2):135-147. 被引量:36

共引文献63

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部