基于蚁群聚集信息素的半监督文本分类算法被引量：4

Semi-supervised Text Classification Algorithm Based on Ant Colony Aggregation Pheromone

下载PDF

导出

摘要半监督文本分类中已标记数据与未标记数据分布不一致,可能导致分类器性能较低。为此,提出一种利用蚁群聚集信息素浓度的半监督文本分类算法。将聚集信息素与传统的文本相似度计算相融合,利用Top-k策略选取出未标记蚂蚁可能归属的种群,依据判断规则判定未标记蚂蚁的置信度,采用随机选择策略,把置信度高的未标记蚂蚁加入到对其最有吸引力的训练种群中。在标准数据集上与朴素贝叶斯算法和EM算法进行对比实验,结果表明,该算法在精确率、召回率以及F1度量方面都取得了更好的效果。 There are many algorithms based on data distribution to effectively solve semi-supervised text categorization. However,they may perform badly when the labeled data distribution is different from the unlabeled data. This paper presents a semi-supervised text classification algorithm based on aggregation pheromone, which is used for species aggregation in real ants and other insects. The proposed method,which has no assumption regarding the data distribution, can be applied to any kind of data distribution. In light of aggregation pheromone,colonies that unlabeled ants may belong to are selected with a Top-k strategy. Then the confidence of unlabeled ants is determined by a judgment rule. Unlabeled ants with higher confidence are added into the most attractive training colony by a random selection strategy. Compared with Naive Bayes and EM algorithm,the experiments on benchmark dataset show that this algorithm performs better on precision,recall and Macro F1.

作者杜芳华冀俊忠吴晨生吴金源

机构地区北京工业大学计算机学院多媒体与智能软件技术北京市重点实验室北京市科学技术情报研究所

出处《计算机工程》 CAS CSCD 2014年第11期167-171,共5页 Computer Engineering

基金国家自然科学基金资助项目(61375059 61332016)

关键词文本分类半监督学习聚集信息素自训练 Top-k策略随机选择策略 text classification semi-supervised learning aggregation pheromone self-training Top-k strategy random selection strategy

分类号 TP311.12 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献14

1Sebastiani F.Machine Learning in Automated Text Categorization [J].ACM Computing Surveys,2002,34(1):1-47.
2苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量：386
3王建会,王洪伟,申展,胡运发.一种实用高效的文本分类算法[J].计算机研究与发展,2005,42(1):85-93. 被引量：20
4Zhu Xiaojin.Semi-supervised Learning Literature Survey [R].University of Wisconsin,Technical Report: CS-1530,2008.
5Zhu Xiaojin,Goldberg A B.Introduction to Semisupervised Learning[M].[S.l.]:Morgan & Claypool Publishers,2009.
6Cohen I,Cozman F G,Sebe N.Semi-supervised Learning of Classifiers: Theory,Algorithm,and Their Application to Human-computer Interaction [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2004,26(12):1553-1567.
7Blum A,Chawla S.Learning from Labeled and Unlabeled Data Using Graph Mincuts[C]//Proceedings of the 18th International Conference on Machine Learning.San Francisco,USA:[s.n.],2001:19-26.
8Li Ming,Zhou Zhihua.SETRED: Self-training with Editing[C]//Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining.Hanoi,Vietnam:[s.n.],2005:611-621.
9Nigam K,McCallum A K,Thrun S.Text Classification from Labeled and Unlabeled Documents Using EM[J].Machine Learning,2000,39(2/3):103-134.
10Nigam K.Using Unlabeled Data to Improve Text Classification[D].[S.l.]:Carnegie Mellon University,2001.

二级参考文献40

1王建会,王洪伟,申展,胡运发.一种实用高效的文本分类算法[J].计算机研究与发展,2005,42(1):85-93. 被引量：20
2李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量：95
3苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量：386
4周水庚.[D].上海:复旦大学,2000.
5王建会胡运发.基于等效半径的文本分类算法．技术报告:021011346[R].复旦大学,2002..
6C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery,1998, 2(2): 955--974.
7R. Schapire, Y. Singer. BoosTexter: A boosting-based system for text categorization. Machine Learning, 2000, 39(2/3) : 135-- 168.
8Y. Dasarathy B. V. Minimal consistent set (MCS) identification for optimal nearest neighbor decision system terms design. IEEE Trans. on System Man Cybern, 1994, 24(3): 511-517.
9W. Lam, C. Y. Ho. Using a generalized instance set for automatic text categorization. The 21st Ann. Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval(SIGIR'98), Melbourne, Australia, 1998.
10Fuchun Peng, Dale Schuurmans. Self-supervised Chinese word segmentation. The 4th International Symposiun on Intelligent Data Analysis(IDA 2001), Cascais, Portugal, 2001.

共引文献421

1李林,刁磊,唐詹,柏召,周晗,郭旭超.基于BERT_Stacked LSTM的农业病虫害问句分类方法[J].农业机械学报,2021,52(S01):172-177. 被引量：6
2姚学恒,张萍,闫立伟,操诚.基于机器学习的企业秘密文档自动分类方法[J].产业与科技论坛,2020,19(7):44-45.
3张小艳,李强.基于SVM的分类方法综述[J].科技信息,2008(28):344-345. 被引量：23
4王辉,左万利,袁华.一种基于质心与本体的文本分类方法[J].计算机研究与发展,2007,44(z2):6-11. 被引量：3
5徐燕,李锦涛,王斌,孙春明,张森.不均衡数据集上文本分类的特征选择研究[J].计算机研究与发展,2007,44(z2):58-62. 被引量：20
6袁志坚,贾焰.基于误差反馈的高速Web文本流快速近似分类[J].计算机研究与发展,2007,44(z3):13-17.
7贾志洋,高炜,王勇刚.结合信息检索技术的半监督文本分类方法[J].苏州大学学报（自然科学版）,2012,28(1):34-39. 被引量：1
8陈思,钱铭宇,刘昌明.文本分类技术研究进展[J].电脑编程技巧与维护,2009(S1):22-24.
9李艾林,李照耀.基于朴素贝叶斯技术的藏文文本分类[J].中文信息,2013(11). 被引量：4
10伍洋,钟鸣,姜艳,李石君.面向审计领域的短文本分类技术研究[J].微电子学与计算机,2015,32(1):5-10. 被引量：7

同被引文献36

1Mousavian Z,Masoudi-Nejad A.Drug-target Interaction Prediction via Chemogenomic Space:Learning-based Methods[J].Expert Opinion on Drug Metabolism&Toxicology,2014,10(9):1273-1287.
2Zhao Mingzhu,Chang Haoteng,Zhou Qiang,et al.Predicting Protein-ligand Interactions Based on Chemical Preference Features with Its Application to New D-amino Acid Oxidase Inhibitor Discovery[J].Current Pharmaceutical Design,2014,20(32):5202-5211.
3Keiser M J,Roth B L,Armbruster B N,et al.Relating Protein Pharmacology by Ligand Chemistry[J].Nature Biotechnol,2007,25(2):197-206.
4Cheng A C,Coleman R G,Smyth K T,et al.Structurebased Maximal Affinity Model Predicts Small-molecule Druggability[J].Nature Biotechnology,2007,25(1):71-75.
5Zhu Shanfeng,Okuno Y,Tsujimoto G,et al.A Probabilistic Model for Mining Implicit‘Chemical Compound-Gene’Relations from Literature[J].Bioinformatics,2005,21(S2):245-251.
6Yamanishi Y,Araki M,Gutteridge A,et al.Prediction of Drug-target Interaction Networks from the Integration of Chemical and Genomic Spaces[J].Bioinformatics,2008,24(13):232-240.
7Bleakley K,Yamanishi Y.Supervised Prediction of Drugtarget Interactions Using Bipartite Local Models[J].Bioinformatics,2009,25(18):2397-2403.
8Yamanishi Y,Kotera M,Kanehisa M,et al.Drug-target Interaction Prediction from Chemical,Genomic and Pharmacological Data in an Integrated Framework[J].Bioinformatics,2010,26(12):246-254.
9Gnen M.Predicting Drug-target Interactions from Chemical and Genomic Kernels Using Bayesian Matrix Factorization[J].Bioinformatics,2012,28(18):2304-2310.
10Xia Zheng,Wu Lingyun,Zhou Xiaobo,et al.Semisupervised Drug-protein Interaction Prediction from Heterogeneous Biological Spaces[J].BMC Systems Biology,2010,4(S2).

引证文献4

1陈天恒,杨晓静,王伟力,宋国旺,袁中琛,王瑶,高圣源,王淑敏.基于蚁群算法的变电站视频监控联动方案优化设计[J].电力系统保护与控制,2016,44(2):134-139. 被引量：15
2彭利红,李泽军,陈敏,任日丽.一种多信息融合的药物-靶标关联预测算法[J].计算机工程,2016,42(6):218-223. 被引量：3
3李波.基于蚁群智能算法的研究文本分类[J].数字技术与应用,2016,34(9):126-126.
4张宇献,陈向文,钱小毅.基于双链量子遗传优化的分类规则挖掘算法[J].沈阳工业大学学报,2021,43(1):61-66. 被引量：5

二级引证文献23

1樊腾飞.一种SCADA系统和视频与环境监控系统在电力系统中联动的应用方法[J].电子设计工程,2016,24(24):113-115. 被引量：9
2杨东海,刘洋,王毅,谢卫华.基于二进制蚁群模糊神经网络的光伏系统MPPT控制算法研究[J].电气工程学报,2017,12(6):41-46. 被引量：7
3滕井玉,蒋正威,杜奇伟,刘永新,江波,金红华.基于视频集成及智能分析的一键式程序化控制技术研究[J].电力系统保护与控制,2017,45(17):72-76. 被引量：15
4汪娟.变电站视频监控在线率保障机制的设计与应用[J].安徽电气工程职业技术学院学报,2017,22(3):93-96.
5杨俊.基于Android的厂矿无线网络监控系统设计[J].自动化与仪器仪表,2018,0(4):132-134. 被引量：1
6张大禹.基于ARM平台的高清视频信号编解码器优化设计[J].计算机测量与控制,2018,26(4):221-224.
7黄俊杰,杨健晟,刘晓波,胡丹晖,方圆.基于双目视觉监控的输电线路立体空间建模[J].电力系统保护与控制,2018,46(19):102-108. 被引量：9
8刘志强,王博龙.中药网络药理学药效成分筛选与靶标预测的研究进展[J].中成药,2019,41(1):171-178. 被引量：80
9李丹清,韩利峰,李嘉曾,吴丽梅,张立园,陈永忠.Nodejs平台下远程视频和信号监控系统的融合[J].仪器仪表用户,2019,26(3):1-5. 被引量：5
10杜非,王广真,张贺军,弓艳朋,葛栋.用于隔离开关位置“双确认”技术的姿态传感器系统检测平台[J].中国电力,2019,52(11):153-158. 被引量：11

1崔婉秋,李昕,孟祥福,崔岩,王大伟.XML中支持top-k的关键字查询方法研究[J].辽宁工业大学学报（自然科学版）,2016,36(3):144-149.
2姚辉.抵御探测响应攻击的分析研究[J].华南金融电脑,2010(1):78-80.

计算机工程

2014年第11期

浏览历史

内容加载中请稍等...

基于蚁群聚集信息素的半监督文本分类算法被引量：4

参考文献14

二级参考文献40

共引文献421

同被引文献36

引证文献4

二级引证文献23

相关作者

相关机构

相关主题

浏览历史

基于蚁群聚集信息素的半监督文本分类算法 被引量：4

参考文献14

二级参考文献40

共引文献421

同被引文献36

引证文献4

二级引证文献23

相关作者

相关机构

相关主题

浏览历史

基于蚁群聚集信息素的半监督文本分类算法被引量：4