期刊文献+

增强领域特征的电力审计文本分类方法 被引量:3

Text categorization method with enhanced domain features in power audit field
下载PDF
导出
摘要 针对电力审计领域的文本具有行业特征明显、文本特征相似度高、分类边界模糊的特性,提出了增强领域特征的电力审计文本分类方法。首先构建面向电力审计的专业词典,提出EF-Doc2VecC模型再联合专业词典增强文本的特征,最后送入BiLSTM分类器实现专业领域的文本分类。实验结果表明,针对专业性显著的电力审计类文本分类,EF-Doc2Vec模型,在召回率、特异性、准确率和F1值分类指标上比对照模型Doc2VecC分别高出4,2,2,2个百分点;针对通用领域文本分类,EF-Doc2VecC模型在召回率、差异性、准确率和F1值分类指标上比对照模型Doc2VecC高出3,3,4,4个百分点。另外,EF-Doc2VecC模型在电力审计类的文本分类性能分别比通用领域高出4,5,3,3个百分点。因此,提出的文本向量表示方法及文本分类方法,不仅能提升通用领域的文本分类性能,还能显著提升垂直领域的文本细粒度分类性能。 For that texts in power audit field has features with obvious industry characteristics,high text feature similarity,and fuzzy classification boundaries,a power audit text classification method with enhanced domain features was proposed. Firstly,a professional dictionary in power audit field was built and the EF-Doc2VecC model was proposed and combined with the professional dictionary text feature to obtain the enhanced feature text. The experimental results show that for the text classification of power audit with significant specialty,the EF-Doc2VecC model is 4,2,2,and 2 percentage points higher than the general domain text in terms of recall,sensitivity,precision,and F1 value classification index. In the general field,the EF-Doc2VecC model is 3,3,4 and 4 percentage points higher than the comparison method Doc2VecC in those evaluation indexes. In addition,comparing the classification performance of this method in vertical and general domains,the text classification performance in power audit field is 4,5,3 and 3 percentage points higher than that in general domain,respectively. Therefore,the text vector representation method and text classification method proposed in this paper can not only improve the text classification performance in the general field,but also significantly improve the finegrained classification performance in the vertical field.
作者 陈平 匡尧 胡景懿 王向阳 蔡静 CHEN Ping;KUANG Yao;HU Jingyi;WANG Xiangyang;CAI Jing(Department of Construction and Management,Wuhan Electric Power Technical College,Wuhan Hubei 430079,China;Department of Audit,State Grid Hubei Electric Power Company Limited,Wuhan Hubei 430072,China)
出处 《计算机应用》 CSCD 北大核心 2020年第S01期109-112,共4页 journal of Computer Applications
基金 国网湖北省电力有限公司科学技术项目(SGHBJP00JGJS1900026)。
关键词 电力审计 文本分类 增强特征 Doc2VecC 双向长短期记忆模型 power audit text categorization enhanced feature Doc2VecC Bidirectional Long Short-Term Memory(BiLSTM)model
  • 相关文献

参考文献3

二级参考文献41

  • 1Baeza-Yates R,Ribeiro-Neto B.Modern Information Retrieval[M].New York:ACM press,1999.
  • 2Manning C D,Schütze H.Foundations of Statistical NaturalLanguage Processing [M].Cambridge:MIT press,1999.
  • 3Hwang M,Choi C,Youn B,et al.Word Sense Disambiguation Based on Relation Structure[C]∥International Conference on Advanced Language Processing and Web Information Technology.2008:15-20.
  • 4Wang X,Mccallum A,Wei X.Topical N-Grams:Phrase andTopic Discovery,with an Application to Information Retrieval [C]∥IEEE International Conference on Data Mining.IEEE Computer Society,2007:697-702.
  • 5Haruechaiyasak C,Jitkrittum W,Sangkeettrakarn C,et al.Im-plementing News Article Category Browsing Based on Text Categorization Technique [C]∥2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.IEEE Computer Society,2008:143-146.
  • 6Mikolov T,Sutskever I,Chen K,et al.Distributed Representations of Words and Phrases and their Compositionality [J].Advances in Neural Information Processing Systems,2013,26:3111-3119.
  • 7Mikolov T,Chen K,Corrado G,et al.Efficient Estimation of Word Representations in Vector Space [C]∥ICLR 2013.2013.
  • 8Joachims T.A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization [M].Springer US,1997:143-151.
  • 9Hinton G E.Learning distributed representations of concepts[C]∥Proceedings of CogSci.1986:1-12.
  • 10Socher R,Bauer J,Manning C D,et al.Parsing with Compositional Vector Grammars [C]∥Meeting of the Association for Computational Linguistics.2013:455-465.

共引文献152

同被引文献37

引证文献3

二级引证文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部