Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis 被引量：1

Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis

导出

摘要 Document subjectivity analysis has become an important aspect of web text content mining. This problem is similar to traditional text categorization, thus many related classification techniques can be adapted here. However, there is one significant difference that more language or semantic information is required for better estimating the subjectivity of a document. Therefore, in this paper, our focuses are mainly on two aspects. One is how to extract useful and meaningful language features, and the other is how to construct appropriate language models efficiently for this special task. For the first issue, we conduct a Global-Filtering and Local-Weighting strategy to select and evaluate language features in a series of n-grams with different orders and within various distance-windows. For the second issue, we adopt Maximum Entropy （MaxEnt） modeling methods to construct our language model framework. Besides the classical MaxEnt models, we have also constructed two kinds of improved models with Gaussian and exponential priors respectively. Detailed experiments given in this paper show that with well selected and weighted language features, MaxEnt models with exponential priors are significantly more suitable for the text subjectivity analysis task. Document subjectivity analysis has become an important aspect of web text content mining. This problem is similar to traditional text categorization, thus many related classification techniques can be adapted here. However, there is one significant difference that more language or semantic information is required for better estimating the subjectivity of a document. Therefore, in this paper, our focuses are mainly on two aspects. One is how to extract useful and meaningful language features, and the other is how to construct appropriate language models efficiently for this special task. For the first issue, we conduct a Global-Filtering and Local-Weighting strategy to select and evaluate language features in a series of n-grams with different orders and within various distance-windows. For the second issue, we adopt Maximum Entropy （MaxEnt） modeling methods to construct our language model framework. Besides the classical MaxEnt models, we have also constructed two kinds of improved models with Gaussian and exponential priors respectively. Detailed experiments given in this paper show that with well selected and weighted language features, MaxEnt models with exponential priors are significantly more suitable for the text subjectivity analysis task.

作者陈博何慧郭军

机构地区 Pattern Recognition and Intelligent System Laboratory

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2008年第2期231-239,共9页 计算机科学技术学报（英文版）

基金 Supported by the National Natural Science Foundation of China under Grant Nos.60475007 and 60675001 the Key Project of Chinese Ministry of Education under Grant No.02029 the Foundation of Chinese Ministry of Education for Century Spanning Talent.

关键词 exponential prior language model maximum entropy N-GRAM subjectivity analysis exponential prior, language model, maximum entropy, n-gram, subjectivity analysis

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献21

1Das S R, Chen M Y. Yahoo! for Amazon: Sentiment extraction from small talk on the web. Working paper, Santa Clara University, Available at http://scumis.scu.edu/srdas/chat.pdf.
2Chesley P, Vincent B, Xu L, Srihari R. Using verbs and adjectives to automatically classify blog sentiment. In Proc. Computational Approaches to Analyzing Weblogs: Papers from the 2006 Spring Symposium, Nicolov N, Salvetti F, Liberman M, Maartin J H (eds.), AAAI Press, Menlo Park, CA, Technical Report SS-06-03, 2006, pp.27-29.
3Gamon M. Sentiment classification on customer feedback data: Noisy data, large feature vectors, and the role of language analysis. In Proc. 20th Int. Conf. Computational Languages, Geneva, CH, 2004, pp.841-847.
4Kennedy A, Inkpen D. Sentiment classification of movie and product reviews using contextual valence shifters. Computational Intelligence, 2006, 22(2): 110-125.
5Berger A L, Della Pietra S A, Della Pietra V J. A maximum entropy approach to natural language processing. Computational Languages, 1996, 22(1): 39-71.
6Rosenfeld R. A maximum entropy approach to adaptive statistical language modeling. Computer, Speech and Language, 1996, 10: 187-228.
7Sebastiani F. Machine learning in automated text categorization: A survey. Tech. Rep. IEI-B4-31-1999, Istituto di Elaborazione dell'Informazione, Consiglio Nazionale delle Ricerche, Pisa, IT, 1999.
8Yang Y. An evaluation of statistical approaches to text categorization. Journal of Information Retrieval, 1999, 1: 69-90.
9Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques. In Proc. Conf. Empirical Methods in Natural Language Processing, Philadelphia, US, 2002, pp.79-86.
10Pang B, Lee L. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proc. 42nd Meeting of the Association for Computational Languages, Barcelona, ES, 2004, pp.271-278.

同被引文献13

1徐琳宏,林鸿飞,杨志豪.基于语义理解的文本倾向性识别机制[J].中文信息学报,2007,21(1):96-100. 被引量：119
2谷学静,王志良,刘冀伟,等.基于HMM的人工心理建模方法的研究.第一届中国情感计算及智能交互学术会议[C],北京,2003:31-36.
3王位春,张铭.基于语义概念相似度的科技文献推荐算法[OL].[2008-07-04].中国科技论文在线http://WWW.paper.edu.Cn.
4黄萱菁,赵军.中文文本情感倾向性分析[OL].[2010-08-15].http://nlpr-web.ia.ac.cn/2008papers/gnhy/nhl2.pdf.
5Liu Hugo, Lieberman H, Selker T. A model of textualaffect sensing using real-world knowledge [ A ]// Proceedings of the 8th international conference on intelligent user interfaces [ C ] ,2003 : 125-132.
6Schapire R E,Singer Y. BoosTexter: A Boosting-based System for Text Categorization [ J ]. Machine Learning, 2000,39:2-3.
7黄曾阳.HNC(概念层次网络)理论[M].北京:清华大学出版社,1998..
8徐军,丁宇新,王晓龙.使用机器学习方法进行新闻的情感自动分类[J].中文信息学报,2007,21(6):95-100. 被引量：107
9刘康,赵军.基于层叠CRFs模型的句子褒贬度分析研究[J].中文信息学报,2008,22(1):123-128. 被引量：24
10陈建美,林鸿飞.中文情感常识知识库的构建[J].情报学报,2009,28(4):492-498. 被引量：14

引证文献1

1韦向峰,张全.基于文本倾向性分析的文献推荐服务研究[J].情报学报,2011,30(11):1136-1144. 被引量：2

二级引证文献2

1李树青.基于三词共现分析的学者主要研究兴趣识别及个性化外文推荐服务的实现[J].情报学报,2013,32(6):629-639. 被引量：17
2黎雪微,应时,洪伟.基于本体和信息量融合的个性化推荐方法研究[J].情报科学,2019,37(9):90-95. 被引量：4

1陈毅东,史晓东.Improving Phrase-Based Statistical Machine Translation Models by Incorporating Syntax-Based Language Models[J].Journal of Donghua University(English Edition),2010,27(2):185-188.
2罗小桂.矩阵奇异值分解(SVD)的应用[J].井冈山医专学报,2005,12(4):133-135. 被引量：7
3姜洋.计算机电子信息技术工程管理及应用[J].科学中国人,2015(5Z).
4本.用TC复制文件夹结构和只复制文件[J].电脑爱好者,2007,0(21):55-55.
5方高林,高文,王兆其.Incorporating Linguistic Structure into Maximum Entropy Language Models[J].Journal of Computer Science & Technology,2003,18(1):131-136.
6金菁.基于改进的聚类平均信息量文本数据挖掘算法研究[J].计算机应用研究,2012,29(3):981-983. 被引量：3
7苑静.应用Matlab实现有限元分析[J].科技资讯,2006,4(23):95-96.
8陈家俭,范垂教.数学课堂教学的语言艺术[J].大连教育学院学报,1997,13(2):55-56. 被引量：1
9苑静.应用Matlab实现有限元分析[J].科技资讯,2007,5(5):77-78. 被引量：2
10黄健,鲁孟梁,杨道海.液压系统故障诊断的发展趋势[J].企业家天地（下旬刊）,2011(6):174-174.

Journal of Computer Science & Technology

2008年第2期

浏览历史

内容加载中请稍等...

Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis 被引量：1

参考文献21

同被引文献13

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史