期刊文献+

统计流形上基于核近邻算法的文本分类研究

Text Classification Based on Kernel Neighbor Algorithm on Statistical Manifold
下载PDF
导出
摘要 为了更加高效地对文本数据进行描述,提出将文本向量表示为统计流形上的点,并用核方法将文本的生成模型和判别模型结合起来.用DCM统计流形上扩散核来表示文本空间上的距离度量,提出DCM流形上的核近邻算法用于文本分类.实验结果表明,在两个实验语料库上基于DCM流形的核近邻算法的准确率和召回率优于对比算法或与对比算法相当. In order to model text processing effectively, text vectors can be represented as points on statistical manifold and kernels can be used to integrate discriminative and generative model. And then, we present diffuse kernels based on Dirichlet compound multinomial (DCM) manifold. More specifically, we proposed kernel nearest neighbor classifier based on kernel distance metric of DCM manifold to implement text classification task. As demonstrated by our experimental results on various real-world text datasets, we show that our text classifier is more desirable and provides much better computational accuracy than some current state-of-the-art methods.
出处 《北京理工大学学报》 EI CAS CSCD 北大核心 2010年第3期315-319,共5页 Transactions of Beijing Institute of Technology
基金 国家部委预研项目(504-4)
关键词 扩散核 核近邻 狄利克雷混合多项式 文本分类 diffuse kernel kernel nearest neighbor Diriehlet compound multinomial text classification
  • 相关文献

参考文献12

  • 1苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:383
  • 2Kondor R, Lafferty J. Diffusion kernels on graphs and other discrete input spaces [C] //Proceedings of the Nineteenth International Conference on Machine Learning. San Mateo, CA, USA: Morgan Kaufmann Press, 2002:315 - 322.
  • 3Lafferty J, Lebanon G. Diffusion kernels on statistical manifolds[J]. Journal of Machine Learning Research, 2004,6:129 - 163.
  • 4Zhang D, Chen X, Lee W S. Text classification with kernels on the multinomial manifold[C] // Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Salvador, Brazil: ACM Press, 2005:266-273.
  • 5Madsen R E, Kauchak D, Elkan C. Modeling word burstiness using the Dirichlet distribution[C]//Proceedings of the 22nd International Conference on Machine Learning. New York, USA: Morgan Kaufmann Press, 2005:545 - 552.
  • 6Yu K, Ji L, Zhang X. Kernel nearest-neighbor algorithm [J]. Neural Processing Letters, 2002, 15: 147 - 156.
  • 7Church K W, Gale W. A poisson mixtures[J]. Natural Language Engineering, 1995,1(2) : 163 - 190.
  • 8Aitchison J. The statistical analysis of compositional data[M]. London: Chapman and Hall, 1986.
  • 9Minka T. Estimating a Dirichlet distribution [EB/OL] [2005-08-17]. http://research. microsoft. com/-minka.
  • 10Amari S, Nagaoka H. Methods of information geometry[M]. Oxford: Oxford University Press, 2000.

二级参考文献3

共引文献382

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部