基于边界特征的情感新词提取方法

Method for new sentiment word extraction based on boundary feature

导出

摘要情感词典作为情感分析任务中的一项基础资源,是观点发现及情感极性判断的重要依据。随着网络新词的大量出现,情感新词的抽取成为一个亟待解决的问题。针对这一问题提出基于边界特征的情感新词的提取方法。该方法利用skip-gram模型挖掘情感词的边界特征、构建边界特征集,利用边界特征集提取情感新词候选集,通过bigram搭配、序列模式等方法对情感新词候选集进行过滤,根据候选串的频次、与其搭配的边界特征在语料中的分布情况对候选串进行评分。在微博语料上的实验结果显示,该方法对情感新词识别的准确率与候选串得分正相关,当候选串得分为11时准确率为83.33%。实验证明,基于边界特征的情感新词的提取方法能够有效地识别大规模语料中的情感新词。 Sentiment dictionary is one of basic language resources. It is an important basis for opinion mining and senti- mental orientation identification. With the new words teeming, new sentiment word extraction is a problem demanding to be solved. In order to solve this problem, this paper presents a method to extract new sentiment words based on boundary fea- ture. It uses skip-gram model and existing sentiment words to extract boundary feature of sentiment words and construct the set of boundary feature. Then it extracts new sentiment words with boundary feature. After the filtering about bigTam and ar- ray model, to score the candidate words. Experimental result on microblog data show that the precision is positively related to the candidate score. The precision is 83.33% when candidate score is 11. The experiment proved that this method is a- ble to extract new sentiment words effectively in biz scale data.

作者朱波侯敏

机构地区中国传媒大学国家语言资源监测与研究有声媒体中心

出处《重庆邮电大学学报（自然科学版）》 CSCD 北大核心 2014年第6期796-802,共7页 Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)

关键词情感新词边界特征 skip-gram 序列模式 sentiment word boundary feature skip-gram array model

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献20

1RILOFF Ellen, WIEBJanyce e. Learning extraction pat- terns for subjective expressions [ C]//Proceedings of the 2003 EMNLP conference. Sapporo, Japan: Conference Publications,2003 : 70-77.
2QIU Likun, ZHANG Weishi, HU Changjian, et al. SELC: A self-supervised model for sentiment classifica- tion [ C ]//Proceedings of CIKM. Hong Kong, China : Con- ference Publications, 2009:929-936.
3LI Si, HE Hui, XU Weiran, et al. Automatic Chinese sentiment word extraction based on maximum entropy [ C ]//Proceeding of the 2009 International Conference on Wavelet Analysis and Pattern Recognition Baoding. [ s. 1. ] : Conference Publications,2009:437-441.
4ZHENG Xiaolin, LIN Zhen, WANG Xiaowei, et al . In- corporating appraisal expression patterns into topic model-ing fi)r aspect and sentiment word identification [ J ]. Knowledge-Based Systems,2014,61 ( 5 ) : 29-47.
5THELEN M, R1LOFF E. A Bootstrapping method for learning semantic lexicons using extraction Pattern Con- texts [ C]//Proceedings of EMNLP. Stroudsburg PA. USA : Association for Computational Linguistics, 2002 : 214-221.
6KANAYAMA H, NASUKAWA T. Fully automatic lexi- con expansion for domain-oriented sentiment analysis [ C]//Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Sydney: Asso- ciation for Computational Linguistics,2006:355-363.
7KAJI N, KITSUREGAWA M. Building lexicon for senti- ment analysis from massive collection of html documents [ C ]//Proceedings of the joint conference on empirical methods in natural language processing and computational natural language learning. Prague : Association for Compu- tational Linguistics ,2007 : 1075-1083.
8彭学仕,孙春华.面向倾向性分析的基于词聚类的基准词选择方法[J].计算机应用研究,2011,28(1):114-116. 被引量：7
9路斌,万小军,杨建武,等.基于同义词词林的词汇褒贬计算[C]//中国计算技术与语言问题研究-第七届中文信息处理国际会议论文集.北京:电子工业出版社,2007:17-23.
10KIM S M, HOVY E. Extracting opinions, opinion hold- ers, and topics expressed in online news media text [C]//Proceedings of ACL/COLING Workshop on Senti- ment and Subjectivity in Text. Sydney, Australia: Con- ference Publications ,2006 : 1-8.

二级参考文献33

1胡和平,曾庆锐,路松峰.中文词聚类研究[J].计算机工程与科学,2006,28(1):122-124. 被引量：9
2朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. 被引量：326
3徐琳宏,林鸿飞,杨志豪.基于语义理解的文本倾向性识别机制[J].中文信息学报,2007,21(1):96-100. 被引量：120
4何燕,穗志方,段慧明,李素建.基于专业术语词典的自动领域本体构造[J].情报学报,2007,26(1):65-70. 被引量：13
5王根,赵军.中文褒贬义词语倾向性的分析[C].第三届学生计算语言学研讨会论集,2006:81-85.
6PETER D.Turney.Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL)//Philadelphia,PA,USA.2002; 417-424.
7PETER D.Turney and MICHAEL L.Littman.Measuring praise and criticism:inference of semantic orientation from association[J].ACM Transactions on Information Systems,2003,21(4):315-346.
8PETER D.Turney and MICHAEL L.Littman.Unsupervised learning of semantic orientation from a hundred-billion-word corpus[R].Tech.Rep.EGB-1094,National Research Council Canada:2002.
9DAVE K.,LAWRENCE S.,and PENNOCK D..Mining the peanut gallery.,opinion extraction and semantic classification of product reviews[C]//Proceedings of the 22nd International World Wide Web Conference.Budapest,Hungary:2003.
10YUEN Raymond W.M.,CHAN Terence Y.W.,LAI Tom B.Y.et al.Morpheme-based derivation of bipolar semantic orientation of Chinese words[C]//Proc.Of the 20th International Conference on Computational Linguistics (COLING-2004),Geneva,Switzerland.2004:1008-1014.

共引文献345

1杜家驹,岂凡超,孙茂松,刘知远.基于局部语义相关性的定义文本义原预测[J].中文信息学报,2020(5):1-9. 被引量：3
2杨频,李涛,赵奎.一种网络舆情的定量分析方法[J].计算机应用研究,2009,26(3):1066-1068. 被引量：19
3申晓晔,封化民,毋非.基于语义的Web新闻内容倾向性分析框架[J].郑州大学学报（理学版）,2009,41(1):33-35.
4罗安,王勇,张福浩,刘纪平.基于角色标注的中文POI名称语义分类方法[J].测绘通报,2012(S1):521-524. 被引量：3
5李国林,万常选,边海容,杨莉,钟敏娟.基于语素的金融证劵域文本情感探测[J].计算机研究与发展,2011,48(S3):54-59. 被引量：7
6徐琳宏,林鸿飞,杨志豪.基于语义理解的文本倾向性识别机制[J].中文信息学报,2007,21(1):96-100. 被引量：120
7马海兵,刘永丹,王兰成,李荣陆.三种文档语义倾向性识别方法的分析与比较[J].现代图书情报技术,2007(4):43-47. 被引量：15
8姚天昉,娄德成.汉语语句主题语义倾向分析方法的研究[J].中文信息学报,2007,21(5):73-79. 被引量：78
9徐军,丁宇新,王晓龙.使用机器学习方法进行新闻的情感自动分类[J].中文信息学报,2007,21(6):95-100. 被引量：107
10孙宏纲,陆余良,刘金红,龚笔宏.基于HowNet的VSM模型扩展在文本分类中的应用研究[J].中文信息学报,2007,21(6):101-108. 被引量：8

1阳馨,蒋伟,刘晓玲.基于多种特征池化的中文文本分类算法[J].四川大学学报（自然科学版）,2017,54(2):287-292. 被引量：11
2于洁.Skip-Gram模型融合词向量投影的微博新词发现[J].计算机系统应用,2016,25(7):130-136. 被引量：3
3李天彩,刘欣,王波,席耀一,王晓雯.短文本信息流中的用户建模与应用[J].信息工程大学学报,2016,17(2):225-230. 被引量：1
4蔡慧苹,王丽丹,段书凯.基于word embedding和CNN的情感分类模型[J].计算机应用研究,2016,33(10):2902-2905. 被引量：20
5雷军.影驰750A SLI魔盒版强大的DIY玩家主板[J].微型计算机,2008(24):73-73.
6侯宏旭,刘群,刘志文,张国强.Skip-N蒙古文统计语言模型[J].内蒙古大学学报（自然科学版）,2008,39(2):220-224. 被引量：8
7夏火松,朱慧毅,魏凤蕊.商品主观评论的情感细分类模型研究[J].情报杂志,2013,32(2):117-120. 被引量：3
8王笑旻.基于Bigram的特征词抽取及自动分类方法研究[J].计算机工程与应用,2005,41(22):177-179. 被引量：5
9Does New Book on Beauty Go Only Skin-Deep Into the Issue?[J].Beijing Review,2015,58(20):46-47.
10图形搜索专利技术的发展进路——Bilski式专利在中国频繁授权[J].科技促进发展,2010(11):79-80.

重庆邮电大学学报（自然科学版）

2014年第6期

浏览历史

内容加载中请稍等...

基于边界特征的情感新词提取方法

参考文献20

二级参考文献33

共引文献345

相关作者

相关机构

相关主题

浏览历史