期刊文献+

基于LSA-HMM的新闻主题分割

News Topic Segmentation Based on LSA-HMM
下载PDF
导出
摘要 主题分割技术是快速并有效地对新闻故事节目进行检索和管理的基础。传统的基于隐马尔可夫模型(HiddenMarkov Model,HMM)的主题分割技术仅使用主题和主题之间的转移寻找主题边界进行新闻分割,并未考虑各主题中词与词之间存在的潜在语义关系。本文提出一种基于隐马尔科夫模型的改进算法。该算法使用潜在语义分析(Latent Se-mantic Analysis,LSA)对词频向量进行特征提取和降维,考虑了词与词之间的上下文关系,通过聚类得到文档类别信息,以LSA特征和主题类别作为HMM的观测和隐状态,这样同时考虑了主题之间的关系,最终实现对文本主题分割。数据实验表明,该算法具有较好的分割性能。 Topic segmentation is the basic of efficiently retrieving and managing news story programs.Traditional topic segmentation technique based on Hidden Markov Model(HMM) only uses the transition of each topic to segment news by searching for the topic boundary,this does not take into account the latent semantic relationship between each word in topics.This paper proposes an improved algorithm based on HMM,the algorithm uses the LSA as dimensionality reduction and feature extraction method on the word frequency vectors,considering the context relationship among words.During the training step,the class label is extracted from the document through the K-means clustering process.The LDA features and the labels are considered as the observation of the hidden states in the HMM,respectively,which also take into account the impact between different topics.Thus,the topic segmentation is implemented.From the results of extensive experiments,the proposed model presents good capability to conduct the task of segmenting the news document.
作者 史倩
出处 《计算机与现代化》 2012年第5期27-30,34,共5页 Computer and Modernization
关键词 主题分割 隐马尔可夫模型 主题模型 潜在语义分析 topic segmentation hidden Markov model topic model latent semantic analysis
  • 相关文献

参考文献14

  • 1Beeferman D, Berger A, Lafferty J. Statistical models for text segmentation[J]. Machine Learning, 1999,34(1-3) :177-210.
  • 2Manning C D. Rethinking Text Segmentation Models:An Information Extraction Case Study [ R~. Technical Report, University of Sydney, 1998.
  • 3Dharanipragada S, Franz M, MeCarley J, et al. Story seg- mentation and topic detection in the broadcast news domain [C]//Proceedings of the DARPA Broadcast News Work- shop. 1999:1-4.
  • 4Hearst M A. Texttiling: Segmenting text into muhi-para- graph subtopic passages [ J ]. Computational Linguistics, 1997,23(1) :33-64.
  • 5Stokes N, Carthy J, Smeaton A F. SeLeCT: A lexical cohe- sion based news story segmentation system [ J ]. Journal of AI Communications, 2004,17 ( 1 ) :3-12.
  • 6Yamron J P, Carp I, Gillick L, et al. A hidden Markov model approach to text segmentation and event tracking [ C]//Proceedings of ICASSP. 1998:333-336.
  • 7Ponte J M, Croft W B. Text segmentation by topic[C]// Proceedings of the First European Conference on Researchand Advanced Technology for Digital Libraries. 1997: 120-129.
  • 8Hofmann T. Unsupervised learning by probabilistic latent semantic analysis [ J ]. Machine Learning Journal, 2001,42 (1) :177-196.
  • 9Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation [J]. Journal of Machine Learning Research, 2003,3(3) : 993-1022.
  • 10刘云中,林亚平,陈治平.基于隐马尔可夫模型的文本信息抽取[J].系统仿真学报,2004,16(3):507-510. 被引量:51

二级参考文献13

  • 1[1]A. McCallum, K. Nigam, J. Rennie, and K. Seymore. A machine learning approach to building Domain-Specific Search Engines [A]. In Proceedings of IJCAI-99 [C]. 622-667.
  • 2[2]Ellien Riloff. Automatically Constructing a Dictionary for Information Extraction Task [A]. Proceeding for the Eleventh National Conference on Artificial Intelligence [C]. 1993. 811-816.
  • 3[3]E. Riloff , R. Jones. Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping [A]. Proceedings of the Sixteenth National Conference on Artificial Intelligence [C]. 1999. 811-816.
  • 4[4]S. Soderland. Learning information extraction rules for semi-structured and free text [J]. Machine Learning, 1999, 1-44.
  • 5[5]Kushmerick, N. Wrapper induction: efficiency and Expressiveness [J]. Artificial Intelligence,2000, Vol. 118, pp. 15--68.
  • 6[6]Leek,T. R. Information Extraction Using Hidden Markov Models [D]. Master's thesis, UC san Diego,1997.
  • 7[7]Kristie Seymore, Andrew McCallum, Ronal Rosenfel. Learning Hidden Markov Model Structure for Information Extract [A]. AAAI' 99 Workshop on Machine Learning for Information Extraction [C]. 1999. 37-42.
  • 8[8]Dayne Frietag, Andrew McCallum. Information Extraction with HMMs and shrinkage [A]. In Proceedings of the AAAI'99 Workshop on Machine Learning for Information Extraction [C], 1999, pp. 31-36.
  • 9[9]Freitag, D., & McCallum, A. Information extraction with HMM structures learned by stochastic optimization [A]. Proceedings of the Eighteenth Conference on Artificial Intelligence [C]. 2000.584-589.
  • 10[10]Freitag, D., McCallum, A., and Pereira F. Maximum Entropy Markov Models for Information Extraction and Segmentation [A]. In proceedings of ICML-2000 [C]. 591-598.

共引文献50

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部