摘要
随着互联网自媒体的兴起,越来越多的藏族人开始使用微博,并在其发表自己的观点和看法,与微博相关的藏文信息处理研究随之得到了学术层面的广泛关注。本文根据藏文微博的行文特征,提出了基于词典与机器学习算法多特征融合的藏文情感分类方法。在特征选择方面,运用藏汉情感词、表情符号等作为特征项。实验发现由于所构建的情感词典覆盖率不够髙导致分类效果不太理想。为了优化实验结果,本文引入了信息增益特征选择的措施,实验显示该措施完全较人工选择特征方法的分类结果有较大的提高。针对特定领域,实验证明融合后的分类效果有了一定程度的提升。
With the development of Web2.0era,more and more Tibetans began to express their own opinions and views on microblog.The Tibetan information processing research related to Tibetan microblog has drawn wide attention from academic communities.According to the expression features of Tibetan micro-blogs,this paper puts forward a method of multi-feature sentiment analysis which based on three kinds of machine learning algorithms.In the aspect of feature selection,it used of emotional words,morphological sequences,emojis and other features.The experimental results indicate that the classification performance was not ideal due to the inadequate coverage of the emotional dictionary constructed.In order to address this problem,the information gain feature selection method is introduced in this paper,and the experiment shows that the method is better than the classification effect of artificial selection feature.In the field of film topic,it is found that the classifier effect of fusion is better than that of single classifier.
作者
杨志
YANG Zhi(Qinghai University For Nationalities, Xining 810007)
出处
《软件》
2017年第11期46-48,94,共4页
Software
基金
青海民族大学校级理工科项目(2016XJQ06)