期刊文献+

基于资源分配网络和语义特征选取的文本分类 被引量:4

Text categorization based on resource allocating network and semantic feature selection
下载PDF
导出
摘要 针对资源分配网络(RAN)算法存在隐含层节点受初始学习数据影响大、收敛速度低等问题,提出一种新的RAN学习算法。通过均值算法确定初始隐含层节点,在原有的"新颖性准则"基础上增加RMS窗口,更好地判定隐含层节点是否增加。同时,采用最小均方(LMS)算法与扩展卡尔曼滤波器(EKF)算法相结合调整网络参数,提高算法学习速度。由于基于词向量空间文本模型很难处理文本的高维特性和语义复杂性,为此通过语义特征选取方法对文本输入空间进行语义特征的抽取和降维。实验结果表明,新的RAN学习算法具有学习速度快、网络结构紧凑、分类效果好的优点,而且,在语义特征选取的同时实现了降维,大幅度减少文本分类时间,有效提高了系统分类准确性。 Confronted with the existence of hidden nodes affected by the initial learning data and the low convergence rate of RAN learning algorithm,a new Resource Allocating Network(RAN)learning algorithm is proposed.The initial hidden layer node,determined through K-means algorithm,adding the'RMS window'based on the novelty rule,can better judge whether to increase hidden layer nodes or not.Meanwhile,the network parameters are adjusted by combining Least Mean Squares algorithm and Extended Kalman Filter algorithm,thus improving the learning rate.Since it is rather difficult to deal with the high dimension characteristics and complex semantic character of texts through words space text categorization method,we reduce the dimension and extract the semantic character space to the text input space through the semantic feature selection method.The experimental results show that the new RAN algorithm has the advantage of high-speed learning,compact network structure and good classification.Moreover,semantic feature selection can not only achieve the reduction of dimension and categorization time,but also raise the accuracy of the categorizing system effectively.
出处 《计算机工程与科学》 CSCD 北大核心 2014年第2期340-346,共7页 Computer Engineering & Science
基金 国家自然科学青年基金资助项目(61103129) 博士点新教师专项研究基金资助项目(20100093120004) 中央高校基本科研业务费专项资金资助项目(JUSRP11130) 江苏省自然科学基金资助项目(SBK201122266)
关键词 RAN学习算法 径向基函数 语义特征选取 扩展卡尔曼滤波器算法 最小均方算法 文本分类 RAN learning algorithm radial basis function semantic feature selection extended Kalman filter algorithm least mean squares algorithm text categorization
  • 相关文献

参考文献5

二级参考文献33

  • 1王建会,王洪伟,申展,胡运发.一种实用高效的文本分类算法[J].计算机研究与发展,2005,42(1):85-93. 被引量:20
  • 2李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量:95
  • 3Yang Y,Pedersen J O. A comparative study on feature selection in text categorization[C]//Proceedings of the 14th International Conference on Machine Learning, Nashville, USA, 1997:412-420.
  • 4Mladenic D,Grobelnik M.Feature selection for unbalanced class distribution and Naive Bayes[C]//Proceedings of 16th International Conference on Machine Learning,San Francisco,1999:255-267.
  • 5Forman G.An extensive empirical study of feature selection metrics for text classification[J]Journal of Machine Learning Research,2003,3:1289-1305.
  • 6McCallum A,Nigam K& comparison of event models for naive bayes text classification[C]//Proceedings of AAAI-98 Workshop on Learning for Text Categorization.Menlo Park : AAAI Press, 1998 : 41-48.
  • 7YANG YM. An Evaluation Of Statistical Approaches to Text Categorization[J]. Information retrieval, 1999, 1(1):69 - 90.
  • 8SHEN Q, CHOUCHOULAS A. A rough-fuzzy approach for generating classification rules[J].Pattern Recognition, 2002, 35(11):2425 - 2438.
  • 9DIAO EL, HU K, LU Y, et al. Boosting Simple Decision Trees with Bayesian Learning for Text Categorization[A]. Proceedings of the 4th World Congress on Intelligent Control and Automation [C],2002.321 - 325.
  • 10YANG YM, PEDERSEN JO. A Comparative Study on Feature Selection in Text Categorization [A]. Proceedings of ICML-97, 14th International Conference on Machine Learning[C].Morgan Kaufmann, 1997.412 - 420.

共引文献410

同被引文献38

  • 1杨世忠,吕剑虹.基于最小资源分配网络的热工对象辨识[J].热能动力工程,2007,22(1):91-95. 被引量:3
  • 2吉根林,凌霄汉,程学云.神经网络集成的分布式入侵检测方法[J].南京航空航天大学学报,2007,39(2):231-235. 被引量:3
  • 3LAZER C, TAMINAU J, MEGANCK S, et al. A survey on filter techniques for feature selection in gene expression microarray analysis[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 2012, 9(4): 1106-1119.
  • 4BLEI D M. Probabilistic topic models[J]. Communications of the ACM, 2012, 55(4): 77-84.
  • 5STEYVERS M, SMYTH P, ZVI M R, et al. Probabilistic author-topic models for information discovery[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). Seattle: ACM, 2004: 306-315.
  • 6TANG J, ZHANG J, YAO L, et al. Arnetminer: extraction and mining of academic social networks[C]//Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD). Nevada: ACM, 2008: 990-998.
  • 7WANG C, BLEI D, HECKERMAN D. Continuous time dynamic topic models[C]//Proceedings of the 24th Conference in Uncertainty in Artificial Intelligence (UAI). Arlington: AUAI, 2009: 110-119.
  • 8YANG P, GAO W, TAN Q, et al. A link-bridged topic model for cross-domain document classification[J]. Information Processing & Management, 2013, 49(6): 1181-1193.
  • 9ZHU J, AHMED A, XING ERIC P. MedLDA: maximum margin supervised topic models for regression and classification[C]//Proceedings of the 26th Annual International Conference on Machine Learning(ICML). Hyderabad: ICML, 2009: 1257-1264.
  • 10WANG L D, YUAN J. Enhancing digital book clustering by LDAC model[J]. IEICE Transactions on Information and Systems, 2012, 95-D(4): 982-988.

引证文献4

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部