期刊文献+

改进的TF-IDF算法在文本分类中的研究 被引量:14

Research on improved TF-IDF algorithm in text classification
下载PDF
导出
摘要 企业数字化建设过程中,对大量日常经营活动文本的数字化处理通常是多任务的,需要对文本数据同时完成信息抽取和文本分类任。在此应用场景下,为了实现更加精准的分类效果,提出一种改进的TF-IDF算法,将文本信息抽取结果也作为文本重要类别区分特征。通过引入信息增益方法得到改进的权重计算公式,进而得到改进的文本特征向量空间表示,再构建文本分类模型。实验以石油行业中文文本为例,选取测试文本2006条进行文本分类对比实验,实验结果表明改进的TF-IDF算法精确率P达到99.3%,召回率R达到98.7%,相比于传统TF-IDF算法文本分类效果得到显著提高。 In the process of digital construction of enterprises,the digital processing of a large number of daily business activity texts of enterprises is usually multi-task,and it is necessary to complete information extraction tasks and text classification tasks for text data at the same time.In this application scenario,in order to achieve a more accurate text classification effect,this paper proposes an improved TF-IDF algorithm,which uses the text information extraction result as the distinguishing feature of important text categories,and introduces the information gain method to obtain an improved weight calculation formula.Then an improved text feature vector space representation is obtained,and then a text classification model is constructed.The experiment takes the Chinese text of the petroleum industry as an example,and selects 2006 test texts for text classification comparison experiments.The experimental results show that the improved TF-IDF algorithm has an accuracy rate P of 99.99%and a recall rate R of 99.87%.The algorithm text classification effect has been significantly improved.
作者 张伟 石倩 何霄 王晨 李禾香 李骥然 Zhang Wei;Shi Qian;He Xiao;Wang Chen;Li Hexiang;Li Jiran(Beijing Petroleum Machinery Co.,Ltd.,China Petroleum Engineering Technology Research Institute,Beijing 102206,China;School of Information,Renmin University of China,Beijing 100872,China)
出处 《信息技术与网络安全》 2021年第7期72-76,83,共6页 Information Technology and Network Security
基金 中国石油集团公司课题“近钻头伽马电阻率成像随钻测井系统研发”(2018E-2107) 中国石油集团工程技术研究院有限公司课题“CGDS近钻头地质导向钻井维保与服务云管理平台技术研究”(CPETQ202003)。
关键词 文本分类 VSM TF-IDF 石油 支持向量机 text classification VSM TF-IDF petroleum support vector machine
  • 相关文献

参考文献6

二级参考文献63

共引文献218

同被引文献119

引证文献14

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部