改进组合算法在中文短文本分类中的应用

Application of Improved Combination Algorithm in Chinese Short Text Categorization

下载PDF

导出

摘要采用最大熵模型与情感分类词典组合的方式对这种短文本用户声音来生成观点。源声的领域通过最大熵模型识别,评价的好坏通过情感模型识别,最后通过领域和评价的组合来得出最终的分类。值得注意的是分类对象具有特征多和类别多等特点,对于源声有多个观点的,可以将源声以分隔符进行拆分,短文本通过分隔符由内向外的文本层次嵌套的分类方法来进行识别源声观点,防止错误输出。结果表明针对中文短文本观点分类,分类器融合是一种高效的分类组合算法。 By using maximum entropy model and emotion classification dictionary combination way, the viewpoint for voice of the consumer is generated. The domain of source sound is identified by the maximum entropy model, and the evaluation of emotion is through emotional model identification. Finally through the combination of domain and evaluation, the final classification is gained. It is worth noting that the classification of object has the characteristics such as more feature and categories. As to the sound source with multiple points of view, source sounds can be separated by separators. Through classification method of the separator from inside-out by hierarchically nested text, the short text identifies viewpoint of sound source to prevent the error output. The results show that as to this classification of Chinese short text, the classifier fusion is an effective combination portfolio algorithm.

作者房满林

机构地区五邑大学信息工程学院

出处《现代工业经济和信息化》 2017年第3期95-97,99,共4页 Modern Industrial Economy and Informationization

关键词文本层次分类最大熵模型情感词典 text hierarchical classification maximum entropy model emotional dictionary

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1刁力力,胡可云,陆玉昌,石纯一.用Boosting方法组合增强Stumps进行文本分类(英文)[J].软件学报,2002,13(8):1361-1367. 被引量：15

二级参考文献5

1[1]Freund, Y., Schapire, R. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 1997,55(1):119～139.
2[2]Breiman, L., Friedman, J., Olshen, R., et al. Classification and Regression Trees. Belmont, CA: Wadsworth, 1984. 1～357.
3[3]Schapire, R., Singer, Y. BoosTexter: a boosting-based system for text categorization. Machine Learning, 2000,39(2/3):135～168.
4[4]Salton, G., Wong, A., Yang, C. A vector space model for automatic indexing. Communications of the ACM, 1995,18:613～620.
5[5]Schapire, R., Singer, Y. Improved boosting algorithms using confidence-related predictions. Machine Learning, 1999,37(3): 297～336.

共引文献14

1董乐红,耿国华,高原.Boosting算法综述[J].计算机应用与软件,2006,23(8):27-29. 被引量：26
2姜远,周志华.基于词频分类器集成的文本分类方法[J].计算机研究与发展,2006,43(10):1681-1687. 被引量：22
3王志玲,王效岳.国内文本分类研究论文的统计分析[J].图书情报工作,2006,50(11):136-138. 被引量：2
4谭建龙,张吉,郭莉.基于通用后缀树模型的垃圾邮件过滤方法[J].计算机工程,2007,33(9):100-102.
5潘志松,燕继坤.少数类的集成学习[J].南京航空航天大学学报,2009,41(4):520-526. 被引量：1
6李文斌,刘椿年,钟宁.基于两阶段集成学习的分类器集成[J].北京工业大学学报,2010,36(3):410-419. 被引量：4
7杨国田,吴章宪,杨鹏远.Boosting在火灾识别中的应用研究[J].计算机工程与应用,2010,46(5):200-204. 被引量：3
8武振宇,贾慧珣,朱骥.Boosting算法对卵巢癌代谢组数据的应用研究[J].中国卫生统计,2012,29(6):786-789. 被引量：4
9谭爱平,成亚玲.基于支持向量机的网络入侵检测集成学习算法[J].湖南工业职业技术学院学报,2013,13(2):3-7. 被引量：1
10谭爱平,陈浩,吴伯桥.基于SVM的网络入侵检测集成学习算法[J].计算机科学,2014,41(2):197-200. 被引量：34

1权小军,林洋港,罗奇鸣,陈恩红.基于概率主题的文本层次分类(英文)[J].中国科学技术大学学报,2009,39(8):875-879. 被引量：2
2谭金波.文本层次分类中特征项权重算法的比较研究[J].情报杂志,2007,26(9):87-88. 被引量：5
3王盛,樊兴华,陈现麟.利用上下位关系的中文短文本分类[J].计算机应用,2010,30(3):603-606. 被引量：38
4范云杰,刘怀亮.基于维基百科的中文短文本分类研究[J].现代图书情报技术,2012(3):47-52. 被引量：34
5胡勇军,江嘉欣,常会友.基于LDA高频词扩展的中文短文本分类[J].现代图书情报技术,2013(6):42-48. 被引量：38
6朱晓敏.软件测试的相关技术应用研究[J].电子测试,2017,28(1):122-123. 被引量：12
7刘泽文,丁冬,李春文.基于条件随机场的中文短文本分词方法[J].清华大学学报（自然科学版）,2015,55(8):906-910. 被引量：17
8廖志芳,周国恩,李俊锋,刘飞,蔡飞.中文短文本语法语义相似度算法[J].湖南大学学报（自然科学版）,2016,43(2):135-140. 被引量：14
9白艺娜,汪西莉.结合均值漂移的基于图的半监督图像分类[J].计算机应用,2013,33(9):2606-2609. 被引量：4
10高翔,李兵.中文短文本去重方法研究[J].计算机工程与应用,2014,50(16):192-197. 被引量：4

现代工业经济和信息化

2017年第3期

浏览历史

内容加载中请稍等...

改进组合算法在中文短文本分类中的应用

参考文献1

二级参考文献5

共引文献14

相关作者

相关机构

相关主题

浏览历史