期刊文献+

基于词典与机器学习的藏文微博情感分析研究 被引量:4

Lexicon and Machine Learning Based Sentiment Analysis of Tibetan Microblogs
下载PDF
导出
摘要 随着互联网自媒体的兴起,越来越多的藏族人开始使用微博,并在其发表自己的观点和看法,与微博相关的藏文信息处理研究随之得到了学术层面的广泛关注。本文根据藏文微博的行文特征,提出了基于词典与机器学习算法多特征融合的藏文情感分类方法。在特征选择方面,运用藏汉情感词、表情符号等作为特征项。实验发现由于所构建的情感词典覆盖率不够髙导致分类效果不太理想。为了优化实验结果,本文引入了信息增益特征选择的措施,实验显示该措施完全较人工选择特征方法的分类结果有较大的提高。针对特定领域,实验证明融合后的分类效果有了一定程度的提升。 With the development of Web2.0era,more and more Tibetans began to express their own opinions and views on microblog.The Tibetan information processing research related to Tibetan microblog has drawn wide attention from academic communities.According to the expression features of Tibetan micro-blogs,this paper puts forward a method of multi-feature sentiment analysis which based on three kinds of machine learning algorithms.In the aspect of feature selection,it used of emotional words,morphological sequences,emojis and other features.The experimental results indicate that the classification performance was not ideal due to the inadequate coverage of the emotional dictionary constructed.In order to address this problem,the information gain feature selection method is introduced in this paper,and the experiment shows that the method is better than the classification effect of artificial selection feature.In the field of film topic,it is found that the classifier effect of fusion is better than that of single classifier.
作者 杨志 YANG Zhi(Qinghai University For Nationalities, Xining 810007)
出处 《软件》 2017年第11期46-48,94,共4页 Software
基金 青海民族大学校级理工科项目(2016XJQ06)
关键词 自然语言处理 情感分类 微博 机器学习 特征选取 特征项权重 NLP Sentiment classification Microblog Feature selection Term weight
  • 相关文献

参考文献11

二级参考文献185

共引文献643

同被引文献28

引证文献4

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部