期刊文献+

基于MIDF(t)的短文本特征权重计算方法研究 被引量:1

Feature weight calculation approach based on short text of MDF(t)
下载PDF
导出
摘要 随着互联网的飞速发展,传统的文本分类已经不能满足人们对信息服务系统的要求,为了实现大规模海量信息的有效利用,高准确率的分类算法成为近年的研究热点。通常情况下,网络上的影评属于短文本,文本中可供抽取的信息词量较少,而对文本分类不起作用的停用词比例相对较大,产生了向量维度高和特征稀疏这两大难题,因而研究难度更大。针对短文本特征稀疏和样本高度不均衡等特点,本文提出方法作为短文本特征权重的计算方法,既考虑了特征项在单个样本中的分布,又考虑了文本的类别特征,提高了短文本分类的查准率和查全率。实验结果表明,与传统的特征权重计算方法相比,该方法更适合短文本的分类。 With the rapid development of the Internet, the traditional text classification can not satisfy people's requirements of information service system, in order to achieve effective use of large-scale mass of information, high accuracy of classification algorithms has become a hot topic in recent years. Under normal circumstances, the film review on network belongs to short text, there are less information words for extraction available in the text, while stop words make a large proportion in the text, resulting in two big issues of high vector dimension and sparse feature that are more difficult to study. In view of the inherent sparse features and unbalanced sample of the short text, the paper proposes a approach to resolve this problem, an approach of short text feature weight named MIDF(t)was proposed. This approach integrated the distribution of features in sample, and improved the precision and recall of short text categorization. The result of experiment indicates that the proposed approach is more suitable for short text classification compared to traditional feature weight calculation methods.
作者 夏冰
出处 《黑龙江科学》 2016年第16期28-29,共2页 Heilongjiang Science
基金 黑龙江省哲学社会科学研究规划项目"基于模糊支持向量机的英语语篇情感分析"(13E024)
关键词 短文本 文本分类 特征权重 Short text Text classification Feature weight
  • 相关文献

参考文献5

二级参考文献35

  • 1王细薇,樊兴华,赵军.一种基于特征扩展的中文短文本分类方法[J].计算机应用,2009,29(3):843-845. 被引量:36
  • 2CUI Zifeng,XU Baowen,ZHANG Weifeng,XU Junling.A New Approach of Feature Selection for Text Categorization[J].Wuhan University Journal of Natural Sciences,2006,11(5):1335-1339. 被引量:6
  • 3LiuBing.Web数据挖掘[M].北京:清华大学出版社,2009.
  • 4Pang Bo, Lee Lillian, Vaithyanathan S. Thumbs up? Sentiment Classification using Machine I-earning Techniques [ C ]// In Proceedings of Conf. on EMNLPO2. [ s. 1. ] : [ s. n. ] ,2002.
  • 5Tumey P. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews [ C ]//In Proc. of the Meeting of the Association for Computational Lin- guistics(ACLlY2). [ s. 1. ]: [ s. n. ], 2002: 417-424.
  • 6Dave K, Lawrence S,Pennock D. Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Re- views[C]//In Proc. of the 12th Intl. World Wide Web Con- ference(WWW93). Is. 1. ] :Is. n. ], 2003: 519-528.
  • 7Pang Bo,Lee Lillian. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimun Cuts[C]// In Proceedings of the 42nd ACL. [s. 1. ]: Is. n. ] ,2004:271-278.
  • 8Kim S, Hovy E. Determining the Sentiment of Opinions [ C ]// In Proc. of the Intl. Conf. on Computational Linguistics (COLING'04). [s. l. ]:[s. n. ], 2004.
  • 9Liu B ,Hu M. Opinion Observer: Analyzing and Comparing Opinions on the Web[ C]// In Proc of the 14th Intl. Word Web Web Conf. ( WWW'05 ). [ s. 1. ] : [ s. n. ], 2005 : 342 -351.
  • 10Yi J, Nasukawa T, Bunescu R C, et al. Sentiment Analyzer: Extracting Sentiments about a Given Topic Using Natural Lan- guage Processing Techniques [ C ]// In Proc. of the IEEE Conf. on Data Mining( ICDM'03 ). [ s. 1. ] : [ s. n. ] ,2003.

共引文献23

同被引文献4

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部