期刊文献+

Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis 被引量:1

Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis
原文传递
导出
摘要 Document subjectivity analysis has become an important aspect of web text content mining. This problem is similar to traditional text categorization, thus many related classification techniques can be adapted here. However, there is one significant difference that more language or semantic information is required for better estimating the subjectivity of a document. Therefore, in this paper, our focuses are mainly on two aspects. One is how to extract useful and meaningful language features, and the other is how to construct appropriate language models efficiently for this special task. For the first issue, we conduct a Global-Filtering and Local-Weighting strategy to select and evaluate language features in a series of n-grams with different orders and within various distance-windows. For the second issue, we adopt Maximum Entropy (MaxEnt) modeling methods to construct our language model framework. Besides the classical MaxEnt models, we have also constructed two kinds of improved models with Gaussian and exponential priors respectively. Detailed experiments given in this paper show that with well selected and weighted language features, MaxEnt models with exponential priors are significantly more suitable for the text subjectivity analysis task. Document subjectivity analysis has become an important aspect of web text content mining. This problem is similar to traditional text categorization, thus many related classification techniques can be adapted here. However, there is one significant difference that more language or semantic information is required for better estimating the subjectivity of a document. Therefore, in this paper, our focuses are mainly on two aspects. One is how to extract useful and meaningful language features, and the other is how to construct appropriate language models efficiently for this special task. For the first issue, we conduct a Global-Filtering and Local-Weighting strategy to select and evaluate language features in a series of n-grams with different orders and within various distance-windows. For the second issue, we adopt Maximum Entropy (MaxEnt) modeling methods to construct our language model framework. Besides the classical MaxEnt models, we have also constructed two kinds of improved models with Gaussian and exponential priors respectively. Detailed experiments given in this paper show that with well selected and weighted language features, MaxEnt models with exponential priors are significantly more suitable for the text subjectivity analysis task.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2008年第2期231-239,共9页 计算机科学技术学报(英文版)
基金 Supported by the National Natural Science Foundation of China under Grant Nos.60475007 and 60675001 the Key Project of Chinese Ministry of Education under Grant No.02029 the Foundation of Chinese Ministry of Education for Century Spanning Talent.
关键词 exponential prior language model maximum entropy N-GRAM subjectivity analysis exponential prior, language model, maximum entropy, n-gram, subjectivity analysis
  • 相关文献

参考文献21

  • 1Das S R, Chen M Y. Yahoo! for Amazon: Sentiment extraction from small talk on the web. Working paper, Santa Clara University, Available at http://scumis.scu.edu/srdas/chat.pdf.
  • 2Chesley P, Vincent B, Xu L, Srihari R. Using verbs and adjectives to automatically classify blog sentiment. In Proc. Computational Approaches to Analyzing Weblogs: Papers from the 2006 Spring Symposium, Nicolov N, Salvetti F, Liberman M, Maartin J H (eds.), AAAI Press, Menlo Park, CA, Technical Report SS-06-03, 2006, pp.27-29.
  • 3Gamon M. Sentiment classification on customer feedback data: Noisy data, large feature vectors, and the role of language analysis. In Proc. 20th Int. Conf. Computational Languages, Geneva, CH, 2004, pp.841-847.
  • 4Kennedy A, Inkpen D. Sentiment classification of movie and product reviews using contextual valence shifters. Computational Intelligence, 2006, 22(2): 110-125.
  • 5Berger A L, Della Pietra S A, Della Pietra V J. A maximum entropy approach to natural language processing. Computational Languages, 1996, 22(1): 39-71.
  • 6Rosenfeld R. A maximum entropy approach to adaptive statistical language modeling. Computer, Speech and Language, 1996, 10: 187-228.
  • 7Sebastiani F. Machine learning in automated text categorization: A survey. Tech. Rep. IEI-B4-31-1999, Istituto di Elaborazione dell'Informazione, Consiglio Nazionale delle Ricerche, Pisa, IT, 1999.
  • 8Yang Y. An evaluation of statistical approaches to text categorization. Journal of Information Retrieval, 1999, 1: 69-90.
  • 9Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques. In Proc. Conf. Empirical Methods in Natural Language Processing, Philadelphia, US, 2002, pp.79-86.
  • 10Pang B, Lee L. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proc. 42nd Meeting of the Association for Computational Languages, Barcelona, ES, 2004, pp.271-278.

同被引文献13

  • 1徐琳宏,林鸿飞,杨志豪.基于语义理解的文本倾向性识别机制[J].中文信息学报,2007,21(1):96-100. 被引量:119
  • 2谷学静,王志良,刘冀伟,等.基于HMM的人工心理建模方法的研究.第一届中国情感计算及智能交互学术会议[C],北京,2003:31-36.
  • 3王位春,张铭.基于语义概念相似度的科技文献推荐算法[OL].[2008-07-04].中国科技论文在线http://WWW.paper.edu.Cn.
  • 4黄萱菁,赵军.中文文本情感倾向性分析[OL].[2010-08-15].http://nlpr-web.ia.ac.cn/2008papers/gnhy/nhl2.pdf.
  • 5Liu Hugo, Lieberman H, Selker T. A model of textualaffect sensing using real-world knowledge [ A ]// Proceedings of the 8th international conference on intelligent user interfaces [ C ] ,2003 : 125-132.
  • 6Schapire R E,Singer Y. BoosTexter: A Boosting-based System for Text Categorization [ J ]. Machine Learning, 2000,39:2-3.
  • 7黄曾阳.HNC(概念层次网络)理论[M].北京:清华大学出版社,1998..
  • 8徐军,丁宇新,王晓龙.使用机器学习方法进行新闻的情感自动分类[J].中文信息学报,2007,21(6):95-100. 被引量:107
  • 9刘康,赵军.基于层叠CRFs模型的句子褒贬度分析研究[J].中文信息学报,2008,22(1):123-128. 被引量:24
  • 10陈建美,林鸿飞.中文情感常识知识库的构建[J].情报学报,2009,28(4):492-498. 被引量:14

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部