期刊文献+

超平面距离的非平衡交互文本情感实例迁移方法 被引量:2

A Transfer Method of Emotional Instances for Unbalanced Interactive Texts Based on Hyperplane Distance
下载PDF
导出
摘要 针对非平衡交互文本少数类实例匮乏易导致训练的情感分类模型泛化性能差的问题,提出基于超平面距离的非平衡交互文本情感实例迁移方法。该方法将在少数类和多数类支持向量之间的源数据集实例作为待迁实例,并基于目标数据集上的分类超平面构造一个偏移超平面。依据最优信息效用原则基于待迁实例到偏移超平面的距离最短来筛选迁入的实例,同时通过调节迁入比例控制迁入实例规模生成合成数据集。实验结果表明:随着迁入实例增多,合成数据集对原始分布的偏离增大,所训练的序列最小优化算法(SMO)模型的泛化分类性能呈现先提升后降低的现象,类似于信息效用的Wundt曲线;与SMOTE、Subsampling、Oversampling 3种数据层处理方法相比,所提方法训练的SMO、LibSVM、随机森林、代价敏感、CNN 5个分类模型在少数类识别F值上平均获得11%的增幅,且迁入比例最佳范围为20%~30%,在有效缓解非平衡特性的同时提高了少数类识别的泛化分类性能。 A transfer method of emotional instances for unbalanced interactive texts is proposed based on hyperplane distance to focus the problem of poor generalization ability of sentiment classification models when they are trained on an unbalanced interactive text dataset that lacks of minority-class instances.The method uses instances of source dataset between support vectors of the minority class and the majority class as the transferrable instances,and constructs an offset hyperplane based on the classification hyperplane on the target dataset.The method uses the principle of optimal information utility to select the transfer instances based on the shortest distance between the instances and the offset hyperplane,and adopts the migration ratio to control the size of the transfer instances and to generate a synthetic dataset.Experiment results show that when transfer instances increase,the deviation of the synthetic dataset from the original distribution increases,and the generalized classification performance of the trained SMO model rises at the beginning and then decreases after it reaches its maximum,which is similar to the Wundt curve of the information utility.Comparisons with three data layer processing methods(SMOTE,Subsampling and Oversampling)show that five classification models(SMO,LibSVM,random forest,cost sensitive and CNN)trained by the proposed method obtain an average increase of 11%in the F-value of recognizing the minority class,and the optimal range of the migration ratio is[20%,30%].It is concluded that the proposed method effectively alleviates the unbalanced characteristics and raises the generalized classification performance of the minority class.
作者 田锋 王媛媛 吴凡 郑庆华 TIAN Feng;WANG Yuanyuan;WU Fan;ZHENG Qinghua(Shannxi Key Laboratory of Satellite and Terrestrial Network Technology Research and Development, Xi’an Jiaotong University,Xi’an 710049,China;School of Electronics and Information Engineering, Xi’an Jiaotong University,Xi’an 710049,China)
出处 《西安交通大学学报》 EI CAS CSCD 北大核心 2018年第10期1-7,共7页 Journal of Xi'an Jiaotong University
基金 国家自然科学基金资助项目(61472315) 国家自然科学基金创新研究群体资助项目(61721002) 国家重点研发计划资助项目(2016YFB1000903) 教育部"创新团队"资助项目(IRT17R86)
关键词 实例迁移 信息效用 非平衡分类 超平面 instance transfer information utility unbalance classification hyperplane
  • 相关文献

参考文献9

二级参考文献87

  • 1邹纲,刘洋,刘群,孟遥,于浩,西野文人,亢世勇.面向Internet的中文新词语检测[J].中文信息学报,2004,18(6):1-9. 被引量:59
  • 2陈晓云,陈袆,王雷,李荣陆,胡运发.基于分类规则树的频繁模式文本分类[J].软件学报,2006,17(5):1017-1025. 被引量:19
  • 3陈晓云,胡运发.基于自适应加权的文本关联分类[J].小型微型计算机系统,2007,28(1):116-121. 被引量:6
  • 4CROFT B, METZLER D, STROHMAN T. Search engines: information retrieval in practice [M]. Reading, MA, USA: Addison-Wesley Publishing Company, 2009: 552.
  • 5LI Hong, WEI Jinfeng. Netnews bursty hot topic detection based on bursty features [C] // Proceedings of International Conference on E-Business and E-Government. Washington DC, USA: IEEE, 2010:1437- 1440.
  • 6HOLZ F, TERESNIAK S. Towards automatic detection and tracking of topic change[M] // GELBUKH A. Computational Linguistics and Intelligent Text Processing. Berlin, Germany: Springer-Verlag, 2010: 327-339.
  • 7JING Qiu, LIAO Lejian, DONG Xiujie. Topic detection and tracking for Chinese news web pages [C]// Proceedings of Seventh International Conference on Advanced Language Processing and Web Information Technology. Washington DC, USA: IEEE Computer Society, 2008: 114-120.
  • 8ALLAN J, PAPKA R, LAVRENKO V. On-line new event detection and tracking [C]//Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 1998: 37-45.
  • 9YANG Yiming, PIERCE T, CARBONELL J. A study of retrospective and on-line event detection [C] //Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information. New York, USA: ACM, 1998: 28-36.
  • 10FUNG G P C, YU J X, LIU H, et al. Time-dependent event hierarchy construction [C]//BERKHIN P, et al. Proceedings of the Thirteenth ACM International Conference on Knowledge Discovery and Data Mining. New York, USA. ACM, 2007: 300-309.

共引文献68

同被引文献23

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部