基于资源分配网络和语义特征选取的文本分类被引量：4

Text categorization based on resource allocating network and semantic feature selection

下载PDF

导出

摘要针对资源分配网络(RAN)算法存在隐含层节点受初始学习数据影响大、收敛速度低等问题,提出一种新的RAN学习算法。通过均值算法确定初始隐含层节点,在原有的"新颖性准则"基础上增加RMS窗口,更好地判定隐含层节点是否增加。同时,采用最小均方(LMS)算法与扩展卡尔曼滤波器(EKF)算法相结合调整网络参数,提高算法学习速度。由于基于词向量空间文本模型很难处理文本的高维特性和语义复杂性,为此通过语义特征选取方法对文本输入空间进行语义特征的抽取和降维。实验结果表明,新的RAN学习算法具有学习速度快、网络结构紧凑、分类效果好的优点,而且,在语义特征选取的同时实现了降维,大幅度减少文本分类时间,有效提高了系统分类准确性。 Confronted with the existence of hidden nodes affected by the initial learning data and the low convergence rate of RAN learning algorithm,a new Resource Allocating Network(RAN)learning algorithm is proposed.The initial hidden layer node,determined through K-means algorithm,adding the'RMS window'based on the novelty rule,can better judge whether to increase hidden layer nodes or not.Meanwhile,the network parameters are adjusted by combining Least Mean Squares algorithm and Extended Kalman Filter algorithm,thus improving the learning rate.Since it is rather difficult to deal with the high dimension characteristics and complex semantic character of texts through words space text categorization method,we reduce the dimension and extract the semantic character space to the text input space through the semantic feature selection method.The experimental results show that the new RAN algorithm has the advantage of high-speed learning,compact network structure and good classification.Moreover,semantic feature selection can not only achieve the reduction of dimension and categorization time,but also raise the accuracy of the categorizing system effectively.

作者何晓亮宋威梁久祯

机构地区江南大学物联网工程学院公安部交通管理科学研究所

出处《计算机工程与科学》 CSCD 北大核心 2014年第2期340-346,共7页 Computer Engineering & Science

基金国家自然科学青年基金资助项目(61103129) 博士点新教师专项研究基金资助项目(20100093120004) 中央高校基本科研业务费专项资金资助项目(JUSRP11130) 江苏省自然科学基金资助项目(SBK201122266)

关键词 RAN学习算法径向基函数语义特征选取扩展卡尔曼滤波器算法最小均方算法文本分类 RAN learning algorithm radial basis function semantic feature selection extended Kalman filter algorithm least mean squares algorithm text categorization

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献5

1王煜,王正欧.基于模糊决策树的文本分类规则抽取[J].计算机应用,2005,25(7):1634-1637. 被引量：13
2苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量：386
3郭昭辉,刘绍翰,武港山.基于神经网络的中文文本分类中的特征选择技术[J].计算机应用研究,2006,23(7):161-164. 被引量：8
4李彬.一种改进的RAN学习算法[J].模式识别与人工智能,2006,19(2):220-226. 被引量：3
5陈景年,黄厚宽,田凤占,瞿有利.一种用于贝叶斯分类器的文本特征选择方法[J].计算机工程与应用,2008,44(13):24-26. 被引量：6

二级参考文献33

1王建会,王洪伟,申展,胡运发.一种实用高效的文本分类算法[J].计算机研究与发展,2005,42(1):85-93. 被引量：20
2李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量：95
3Yang Y,Pedersen J O. A comparative study on feature selection in text categorization[C]//Proceedings of the 14th International Conference on Machine Learning, Nashville, USA, 1997:412-420.
4Mladenic D,Grobelnik M.Feature selection for unbalanced class distribution and Naive Bayes[C]//Proceedings of 16th International Conference on Machine Learning,San Francisco,1999:255-267.
5Forman G.An extensive empirical study of feature selection metrics for text classification[J]Journal of Machine Learning Research,2003,3:1289-1305.
6McCallum A,Nigam K& comparison of event models for naive bayes text classification[C]//Proceedings of AAAI-98 Workshop on Learning for Text Categorization.Menlo Park : AAAI Press, 1998 : 41-48.
7YANG YM. An Evaluation Of Statistical Approaches to Text Categorization[J]. Information retrieval, 1999, 1(1):69 - 90.
8SHEN Q, CHOUCHOULAS A. A rough-fuzzy approach for generating classification rules[J].Pattern Recognition, 2002, 35(11):2425 - 2438.
9DIAO EL, HU K, LU Y, et al. Boosting Simple Decision Trees with Bayesian Learning for Text Categorization[A]. Proceedings of the 4th World Congress on Intelligent Control and Automation [C],2002.321 - 325.
10YANG YM, PEDERSEN JO. A Comparative Study on Feature Selection in Text Categorization [A]. Proceedings of ICML-97, 14th International Conference on Machine Learning[C].Morgan Kaufmann, 1997.412 - 420.

共引文献410

1李林,刁磊,唐詹,柏召,周晗,郭旭超.基于BERT_Stacked LSTM的农业病虫害问句分类方法[J].农业机械学报,2021,52(S01):172-177. 被引量：6
2姚学恒,张萍,闫立伟,操诚.基于机器学习的企业秘密文档自动分类方法[J].产业与科技论坛,2020,19(7):44-45.
3张小艳,李强.基于SVM的分类方法综述[J].科技信息,2008(28):344-345. 被引量：23
4王辉,左万利,袁华.一种基于质心与本体的文本分类方法[J].计算机研究与发展,2007,44(z2):6-11. 被引量：3
5徐燕,李锦涛,王斌,孙春明,张森.不均衡数据集上文本分类的特征选择研究[J].计算机研究与发展,2007,44(z2):58-62. 被引量：20
6袁志坚,贾焰.基于误差反馈的高速Web文本流快速近似分类[J].计算机研究与发展,2007,44(z3):13-17.
7贾志洋,高炜,王勇刚.结合信息检索技术的半监督文本分类方法[J].苏州大学学报（自然科学版）,2012,28(1):34-39. 被引量：1
8陈思,钱铭宇,刘昌明.文本分类技术研究进展[J].电脑编程技巧与维护,2009(S1):22-24.
9李艾林,李照耀.基于朴素贝叶斯技术的藏文文本分类[J].中文信息,2013(11). 被引量：4
10伍洋,钟鸣,姜艳,李石君.面向审计领域的短文本分类技术研究[J].微电子学与计算机,2015,32(1):5-10. 被引量：7

同被引文献38

1杨世忠,吕剑虹.基于最小资源分配网络的热工对象辨识[J].热能动力工程,2007,22(1):91-95. 被引量：3
2吉根林,凌霄汉,程学云.神经网络集成的分布式入侵检测方法[J].南京航空航天大学学报,2007,39(2):231-235. 被引量：3
3LAZER C, TAMINAU J, MEGANCK S, et al. A survey on filter techniques for feature selection in gene expression microarray analysis[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 2012, 9(4): 1106-1119.
4BLEI D M. Probabilistic topic models[J]. Communications of the ACM, 2012, 55(4): 77-84.
5STEYVERS M, SMYTH P, ZVI M R, et al. Probabilistic author-topic models for information discovery[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). Seattle: ACM, 2004: 306-315.
6TANG J, ZHANG J, YAO L, et al. Arnetminer: extraction and mining of academic social networks[C]//Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD). Nevada: ACM, 2008: 990-998.
7WANG C, BLEI D, HECKERMAN D. Continuous time dynamic topic models[C]//Proceedings of the 24th Conference in Uncertainty in Artificial Intelligence (UAI). Arlington: AUAI, 2009: 110-119.
8YANG P, GAO W, TAN Q, et al. A link-bridged topic model for cross-domain document classification[J]. Information Processing & Management, 2013, 49(6): 1181-1193.
9ZHU J, AHMED A, XING ERIC P. MedLDA: maximum margin supervised topic models for regression and classification[C]//Proceedings of the 26th Annual International Conference on Machine Learning(ICML). Hyderabad: ICML, 2009: 1257-1264.
10WANG L D, YUAN J. Enhancing digital book clustering by LDAC model[J]. IEICE Transactions on Information and Systems, 2012, 95-D(4): 982-988.

引证文献4

1王庆福.基于PageRank算法的文本关键词权重计算研究[J].网络新媒体技术,2015,4(3):37-41.
2王李冬,张引,吕明琪.基于词组主题建模的文本语义压缩算法[J].西南交通大学学报,2015,50(4):755-763. 被引量：4
3张安国,张树勋,朱巍,李秀敏,黄金龙.基于资源分配网络的小数据集并行集成学习方法[J].计算机应用研究,2019,36(4):997-1000. 被引量：2
4周丽杰,于伟海,郭成.基于关键词协同投票过滤的短文本特征提取算法研究[J].泰山学院学报,2015,37(6):43-47. 被引量：2

二级引证文献8

1尹帮旭,陈帆,魏巍,王宏霞.基于重要块二值特征的电子凭证自恢复水印算法[J].西南交通大学学报,2017,52(1):156-163. 被引量：1
2张焕成,林正奎.词向量提取评论观点句方法研究[J].西部皮革,2017,39(10):271-273.
3张焕成.基于词向量的手机网评观点句提取方法研究[J].信息通信,2017,30(2):32-35.
4何伟林,谢红玲,奉国和.潜在狄利克雷分布模型研究综述[J].信息资源管理学报,2018,8(1):55-64. 被引量：25
5张春祥,徐志峰,高雪瑶.一种半监督的汉语词义消歧方法[J].西南交通大学学报,2019,54(2):408-414. 被引量：7
6张春祥,熊经钊,高雪瑶.基于半监督集成学习的词义消歧[J].哈尔滨工程大学学报,2020,41(8):1216-1222. 被引量：1
7周玫.多媒体网络环境下英语听力库资源自动集成方法[J].微型电脑应用,2022,38(8):144-147.
8余建想.基于SOA的网络云信息系统自动化数据集成[J].信息与电脑,2022,34(23):10-12. 被引量：3

1李彬.一种改进的RAN学习算法[J].模式识别与人工智能,2006,19(2):220-226. 被引量：3
2武星军,朱世强,金波.RAN网络及其应用的研究[J].仪器仪表学报,2001,22(1):13-16. 被引量：2
3刘春梅,沈毅,胡恒章,葛升民.基于高阶神经网络扩展卡尔曼滤波器算法的非线性动态系统辨识[J].哈尔滨工业大学学报,2000,32(2):107-110. 被引量：1
4李彬,李贻斌.基于ELM学习算法的混沌时间序列预测[J].天津大学学报,2011,44(8):701-704. 被引量：41
5高洁,吉根林.文本分类技术研究[J].计算机应用研究,2004,21(7):28-30. 被引量：36
6任相强,黄耀志.基于DSP的无速度传感器矢量控制系统设计[J].现代电子技术,2006,29(6):18-20. 被引量：2
7金红华,欧阳斌林,周修理,果莉.基于TMS320VC5402自适应滤波器的实现[J].东北农业大学学报,2006,37(6):816-819. 被引量：4
8华镕.数据集成:一、更多的数据,更好的工具,更多的机会[J].中国仪器仪表,2016(9):54-57.
9张楠,于波.基于概念格的Web文本挖掘方法[J].大庆石油学院学报,2009,33(3):108-111.
10曹春萍,崔海船.基于LSA和结构特性的微博话题检测[J].计算机应用研究,2015,32(9):2720-2723. 被引量：3

计算机工程与科学

2014年第2期

浏览历史

内容加载中请稍等...

基于资源分配网络和语义特征选取的文本分类被引量：4

参考文献5

二级参考文献33

共引文献410

同被引文献38

引证文献4

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

基于资源分配网络和语义特征选取的文本分类 被引量：4

参考文献5

二级参考文献33

共引文献410

同被引文献38

引证文献4

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

基于资源分配网络和语义特征选取的文本分类被引量：4