期刊文献+

扩展DPMM模型在短文本主题识别中的应用

APPLYING EXTENDED DPMM IN RECOGNITION OF SHORT TEXT TOPIC
下载PDF
导出
摘要 近年来,话题检测与追踪(TDT)得到广泛研究。然而,研究主要基于常规的新闻,扩展到短篇报道依然有问题。提出基于耿氏混合模型(DPMM)的话题识别方法,以统一的模型处理话题切分和TDT。介绍DPMM在话题识别中的应用以及讨论两种专门用来解决短篇报道的稀疏问题的方案。一个是算法流程,将话题识别的处理单元由单个短文本转为会话。另一个是扩展DPMM模型,当估算与已知的话题的关联词时考虑字的依赖。随后,通过同时处理话题切分和TDT来识别自发文本流的话题。DPMM模型的优势在于混合组件的数量不必提前确定,并且不需要话题数量与内容的前期准备,因此它更加适合流文本话题识别。实验结果表明,DPMM模型对处理短文本数据的话题识别是有效的。 Recently,topic detection and tracking(TDT) has been widely studied. However,the research is mainly based on conventional news,there is still the problem in extending it to short reports. In this paper,we raise the Dirichlet process mixture model(DPMM)-based topic recognition method,which deals with topic segmentation and TDT in a uniform model. We introduce the application of DPMM in topic recognition and discuss two methods which are specifically designed to solve sparseness problem associated with short text. One is the algorithm flow,it converts the single processing unit of topic recognition to session. The other uses extended DPMM model which considers word dependency when estimating the distributions of words associated with each known topic. Subsequently,we distinguish the topics of spontaneous text streams by simultaneously processing topic segmentation and TDT. The advantages of DPMM are the number of its mixture components does not need to be determined in advance,and it does not need early preparation about the number and content of topics,so it is more suitable for streaming topic recognition. Experimental result demonstrates that DPMM is efficacious in dealing with the topic recognition of short text.
作者 汪海波
出处 《计算机应用与软件》 CSCD 北大核心 2014年第8期191-195,共5页 Computer Applications and Software
关键词 话题识别 混合模型 扩展耿氏过程 流数据 静态短文本 Topic recognition Mixture model Extended DPMM Data streams Static short text
  • 相关文献

参考文献15

  • 1Allan James.Topic Detection and Tracking:Event-based Information Organization[M].Kluwer Academic Publishers,2009.
  • 2Yang Y M.Multi-Strategy Learning for TDT,Topic Detection and Tracking:Event Based Information Organization.Boston[M].Kluwer Academic Publishers,2002:83-114.
  • 3Ferguson T S.A Bayesian analysis of some nonparametric problems[J].Annals of Statistics,1973,24(5):209-230.
  • 4Yang Y,Carbonell I,Brown R.Learning approaches for detecting and tracking news events[J].IEEE Intelligent Systems Special Issue on Applications of Intelligent Information Retrieval 14,1999,23(2):32-43.
  • 5van Mulbregt P,Carp I,Gillick L.Text segmentation and topic tracking on broadcast news via a hidden markov model approach[J].Proceedings of 5th Inter.Conference on Spoken Language Processing,1998,14(9):90-98.
  • 6买志玉,朱彦松.基于主动诊断推荐的双层K-最近邻算法[J].中原工学院学报,2009,20(4):23-26. 被引量:1
  • 7冯现坤,刘羽,蒋细芳.朴素贝叶斯分类算法在数据预测中的应用[J].软件导刊,2011,10(5):65-66. 被引量:4
  • 8邓爱林,朱扬勇,施伯乐.基于项目评分预测的协同过滤推荐算法[J].软件学报,2003,14(9):1621-1628. 被引量:559
  • 9Fisher S,Roark B.Feature expansion for query-focused supervised sentence ranking[C]//Proceedings of the Document Understanding Workshop,2007:456-462.
  • 10Yang Chengzen.Improving Hierarchical Taxonomy Integration with Semantic Feature Expansion on Category-Specific Terms[J].Lecture Notes in Computer Science,2008,18(9):225-236.

二级参考文献21

  • 1李艳,刘信杰,胡学钢.数据挖掘中朴素贝叶斯分类器的应用[J].潍坊学院学报,2007(4):48-50. 被引量:2
  • 2余芳,姜云飞.一种基于朴素贝叶斯分类的特征选择方法[J].中山大学学报(自然科学版),2004,43(5):118-120. 被引量:24
  • 3钱晓东,王正欧.基于改进KNN的文本分类方法[J].情报科学,2005,23(4):550-554. 被引量:19
  • 4Soucy P, Mineau G W. A Simple KNN Algorithm for Text Categorization[J]. Proceedings of the IEEE International Conference on Data Mining, 2001 (2) :647 - 648.
  • 5Qiu Chen Yan,Nixon M S,Damper R I. Implementing the k-nearest Neighbour Rule via a Neural Network[J].Proceedings of the IEEE International Conference on Neural Networks, 1995(1):136-140.
  • 6Kuncheva L I. Fitness Function in Editing Knn Reference Set by Genetic Algorithms[J]. Pattern Recognition, 1997,30 (6) : 1041-1049.
  • 7Brccsc J, Hcchcrman D, Kadic C. Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI'98). 1998.43~52.
  • 8Goldberg D, Nichols D, Oki BM, Terry D. Using collaborative filtering to weave an information tapestry. Communications of the ACM, 1992,35(12):61~70.
  • 9Resnick P, lacovou N, Suchak M, Bergstrom P, Riedl J. Grouplens: An open architecture for collaborative filtering of netnews. In:Proceedings of the ACM CSCW'94 Conference on Computer-Supported Cooperative Work. 1994. 175~186.
  • 10Shardanand U, Mats P. Social information filtering: Algorithms for automating "Word of Mouth". In: Proceedings of the ACM CHI'95 Conference on Human Factors in Computing Systems. 1995. 210~217.

共引文献561

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部