扩展DPMM模型在短文本主题识别中的应用

APPLYING EXTENDED DPMM IN RECOGNITION OF SHORT TEXT TOPIC

下载PDF

导出

摘要近年来,话题检测与追踪(TDT)得到广泛研究。然而,研究主要基于常规的新闻,扩展到短篇报道依然有问题。提出基于耿氏混合模型(DPMM)的话题识别方法,以统一的模型处理话题切分和TDT。介绍DPMM在话题识别中的应用以及讨论两种专门用来解决短篇报道的稀疏问题的方案。一个是算法流程,将话题识别的处理单元由单个短文本转为会话。另一个是扩展DPMM模型,当估算与已知的话题的关联词时考虑字的依赖。随后,通过同时处理话题切分和TDT来识别自发文本流的话题。DPMM模型的优势在于混合组件的数量不必提前确定,并且不需要话题数量与内容的前期准备,因此它更加适合流文本话题识别。实验结果表明,DPMM模型对处理短文本数据的话题识别是有效的。 Recently,topic detection and tracking（TDT） has been widely studied. However,the research is mainly based on conventional news,there is still the problem in extending it to short reports. In this paper,we raise the Dirichlet process mixture model（DPMM）-based topic recognition method,which deals with topic segmentation and TDT in a uniform model. We introduce the application of DPMM in topic recognition and discuss two methods which are specifically designed to solve sparseness problem associated with short text. One is the algorithm flow,it converts the single processing unit of topic recognition to session. The other uses extended DPMM model which considers word dependency when estimating the distributions of words associated with each known topic. Subsequently,we distinguish the topics of spontaneous text streams by simultaneously processing topic segmentation and TDT. The advantages of DPMM are the number of its mixture components does not need to be determined in advance,and it does not need early preparation about the number and content of topics,so it is more suitable for streaming topic recognition. Experimental result demonstrates that DPMM is efficacious in dealing with the topic recognition of short text.

作者汪海波

机构地区淮安信息职业技术学院计算机通信学院

出处《计算机应用与软件》 CSCD 北大核心 2014年第8期191-195,共5页 Computer Applications and Software

关键词话题识别混合模型扩展耿氏过程流数据静态短文本 Topic recognition Mixture model Extended DPMM Data streams Static short text

分类号 TP311.1 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献15

1Allan James.Topic Detection and Tracking:Event-based Information Organization[M].Kluwer Academic Publishers,2009.
2Yang Y M.Multi-Strategy Learning for TDT,Topic Detection and Tracking:Event Based Information Organization.Boston[M].Kluwer Academic Publishers,2002:83-114.
3Ferguson T S.A Bayesian analysis of some nonparametric problems[J].Annals of Statistics,1973,24(5):209-230.
4Yang Y,Carbonell I,Brown R.Learning approaches for detecting and tracking news events[J].IEEE Intelligent Systems Special Issue on Applications of Intelligent Information Retrieval 14,1999,23(2):32-43.
5van Mulbregt P,Carp I,Gillick L.Text segmentation and topic tracking on broadcast news via a hidden markov model approach[J].Proceedings of 5th Inter.Conference on Spoken Language Processing,1998,14(9):90-98.
6买志玉,朱彦松.基于主动诊断推荐的双层K-最近邻算法[J].中原工学院学报,2009,20(4):23-26. 被引量：1
7冯现坤,刘羽,蒋细芳.朴素贝叶斯分类算法在数据预测中的应用[J].软件导刊,2011,10(5):65-66. 被引量：4
8邓爱林,朱扬勇,施伯乐.基于项目评分预测的协同过滤推荐算法[J].软件学报,2003,14(9):1621-1628. 被引量：559
9Fisher S,Roark B.Feature expansion for query-focused supervised sentence ranking[C]//Proceedings of the Document Understanding Workshop,2007:456-462.
10Yang Chengzen.Improving Hierarchical Taxonomy Integration with Semantic Feature Expansion on Category-Specific Terms[J].Lecture Notes in Computer Science,2008,18(9):225-236.

二级参考文献21

1李艳,刘信杰,胡学钢.数据挖掘中朴素贝叶斯分类器的应用[J].潍坊学院学报,2007(4):48-50. 被引量：2
2余芳,姜云飞.一种基于朴素贝叶斯分类的特征选择方法[J].中山大学学报（自然科学版）,2004,43(5):118-120. 被引量：24
3钱晓东,王正欧.基于改进KNN的文本分类方法[J].情报科学,2005,23(4):550-554. 被引量：19
4Soucy P, Mineau G W. A Simple KNN Algorithm for Text Categorization[J]. Proceedings of the IEEE International Conference on Data Mining, 2001 (2) :647 - 648.
5Qiu Chen Yan,Nixon M S,Damper R I. Implementing the k-nearest Neighbour Rule via a Neural Network[J].Proceedings of the IEEE International Conference on Neural Networks, 1995(1):136-140.
6Kuncheva L I. Fitness Function in Editing Knn Reference Set by Genetic Algorithms[J]. Pattern Recognition, 1997,30 (6) : 1041-1049.
7Brccsc J, Hcchcrman D, Kadic C. Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI'98). 1998.43～52.
8Goldberg D, Nichols D, Oki BM, Terry D. Using collaborative filtering to weave an information tapestry. Communications of the ACM, 1992,35(12):61～70.
9Resnick P, lacovou N, Suchak M, Bergstrom P, Riedl J. Grouplens: An open architecture for collaborative filtering of netnews. In:Proceedings of the ACM CSCW'94 Conference on Computer-Supported Cooperative Work. 1994. 175～186.
10Shardanand U, Mats P. Social information filtering: Algorithms for automating "Word of Mouth". In: Proceedings of the ACM CHI'95 Conference on Human Factors in Computing Systems. 1995. 210～217.

共引文献561

1陈晋鹏,李海洋,张帆,李环,魏凯敏.基于会话的推荐方法综述[J].中文信息学报,2023,37(3):1-17. 被引量：4
2查琇山,刘方方.基于缺失值补全和SVD的手游推荐方法[J].计算机应用研究,2020,37(S02):166-169. 被引量：1
3张光卫,康建初,李鹤松,刘常昱,李德毅.面向场景的协同过滤推荐算法[J].系统仿真学报,2006,18(z2):595-601. 被引量：27
4龚松杰.个性化推荐中一种新的相似性计算方法[J].计算机系统应用,2008,17(7):87-89. 被引量：1
5娄建玮,刘红军,郑伟.C#/SQL实现基于项目评分预测的推荐算法[J].职大学报,2007(4):22-23.
6谢瑗瑗,胡祥光,刘军,谷发平.P2P网络中信任模型研究综述[J].军事通信技术,2009,30(2):38-42. 被引量：4
7王恒.基于协同过滤的电子农务推荐系统模型研究[J].宁夏大学学报（自然科学版）,2009,30(4):358-360. 被引量：2
8王茜,杨莉云,杨德礼.面向用户偏好的属性值评分分布协同过滤算法[J].系统工程学报,2010,25(4):561-568. 被引量：24
9张明磊,韩明,王震洲.基于安全多方计算的系统间隐私保持推荐算法[J].河北工业大学学报,2012,41(4):14-18. 被引量：1
10刘旭东.个性化网页推荐系统在电子商务中的设计与应用[J].烟台职业学院学报,2008,14(4):87-92. 被引量：2

1石琼,潘广贞.基于搜索引擎的文本主题识别[J].装备制造技术,2009(6):92-94.
2吕洪艳,刘芳.基于组合核函数SVM的文本主题识别[J].微型电脑应用,2016,32(5):73-76. 被引量：2
3聂晶.Web数据库技术及其发展趋势[J].中国电子商务,2009(11):81-82.
4顾益军,樊孝忠,黄维金,于江德.一种文本讨论线索的自动获取方法[J].华南理工大学学报（自然科学版）,2004,32(z1):96-98. 被引量：2

计算机应用与软件

2014年第8期

浏览历史

内容加载中请稍等...

扩展DPMM模型在短文本主题识别中的应用

参考文献15

二级参考文献21

共引文献561

相关作者

相关机构

相关主题

浏览历史