期刊文献+

基于分层狄利克雷过程模型的文本分割 被引量:2

Text Segmentation Based on Hierarchical Dirichlet Processes
下载PDF
导出
摘要 文本分割在文本摘要、信息检索等诸多领域都有重要的应用。主题模型是该领域研究中的重要方法,但目前基于主题模型的方法普遍依赖于主题个数的人工设置。针对此问题,本文提出了一种基于分层狄利克雷过程(Hierarchical Dirichlet process,HDP)模型的文本分割方法。首先使用HDP模型获取文本在主题空间的向量表示,然后将主题向量用于C99分割算法实现文本分割,最后使用两种优化策略对结果进行优化。实验结果表明,基于HDP模型的方法能够摆脱对人工设置主题个数的依赖,有效提高了文本分割的性能。 Text segmentation has important applications in many fields,including text summarization,information retrieval,and so on.Topic model is an important tool in text segmentation.However previous text segmentation methods based on topic model generally rely on manually setting of the number of topics influencing results significantly.To solve the problem,a novel text segmentation method based on hierarchical Dirichlet process(HDP)model is proposed.Firstly,texts are modeled with HDP model to get their expression with topic vectors.Then,the topic vectors are used in C99 segmentation algorithm for text segmentation.Finally,two optimization strategies are applied to result optimization.Experimental results show that the presented method can omit manually setting of the topics numbers and improve the performance of text segmentation.
作者 李天彩 王波 席耀一 张佳明 Li Tiancai Wang Bo Xi Yaoyi Zhang Jiaming(Institute of Information and System Engineering, PLA Information Engineering University, Zhengzhou, 450002, Chin)
出处 《数据采集与处理》 CSCD 北大核心 2017年第2期408-416,共9页 Journal of Data Acquisition and Processing
基金 国家高技术研究发展计划("八六三"计划)(2011AA7032030D)资助项目 全军军事研究生课题(2011JY002-158)资助项目
关键词 主题模型 文本分割 分层狄利克雷过程 CRF构造 topic model text segmentation hierarchical Dirichlet process Chinese restaurant franchise(CRF) process
  • 相关文献

参考文献6

二级参考文献173

  • 1朱靖波,叶娜,罗海涛.基于多元判别分析的文本分割模型[J].软件学报,2007,18(3):555-564. 被引量:15
  • 2石晶,戴国忠.基于PLSA模型的文本分割[J].计算机研究与发展,2007,44(2):242-248. 被引量:25
  • 3Mitchell T M. Machine Learning. New York: McGraw-Hill, 1997.
  • 4Teh Y W. Dirichlet processes. Encyclopedia of Machine Learning, Springer, 2010. Part 5, 280-287.
  • 5Teh Y W, Jordan M I. Hierarchical Bayesian nonparametric models with applications. Bayesian Nonparametrics Princi- ples and Practice. Cambridge University Press, 2009. 1-47.
  • 6Teh Y W, Jordan M I, Beal M J, Blei D M. Sharing clus- ters among related groups: hierarchical Dirichlet processes. In: Proceedings of the Advances in Neural Information Processing Systems. Vancouver, Canada: The MIT Press, 2004. 1385 - 1392.
  • 7Teh Y W, Jordan M I, Beal M J, Blei D M. Hierarchical Dirichlet processes. Journal of the American Statistical As- sociation, 2006, 101(476): 1566-1581.
  • 8Yakhnenko O, Honavar V. Multi-modal hierarchical Dirich- let process model for predicting image annotation and image-object label correspondence. In: Proceedings of the SIAM International Conference on Data Mining. Sparks, USA: SIAM, 2009. 281-294.
  • 9Wang X G, Ma X K, Grimson W E L. Unsupervised activity perception by hierarchical Bayesian models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA: IEEE, 2007. 1-8.
  • 10Wang X, Tieu K, Gee-Wah N, Grimson W E L. Trajectory analysis and semantic region modeling using a nonpaxamet- ric Bayesian model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE, 2008. 1-8.

共引文献116

同被引文献19

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部