期刊文献+

基于核主成分分析的话题跟踪系统 被引量:3

Kernel principal component analysis based topic tracking system
原文传递
导出
摘要 话题跟踪是信息处理中的一项重要技术,如何提取鲁棒的话题样本特征是其中的研究重点。针对样本中的话题偏移问题,提出一种基于核主成分分析的算法。该算法首先利用开发集的先验知识构建加权矩阵;然后采用核主成分分析对样本进行话题偏移补偿,从而有效地去除了话题偏移的影响,提升了样本特征的鲁棒性;最后通过K-最近邻(K-nearest neighbor,KNN)和Rocchio算法进行分类。在Fisher英文数据库的话题跟踪测试结果表明,相对于基线系统,该系统在检测代价上有15%~18%的相对降低。 Topic tracking is important in information processing with robust feature extraction as a key research point.This paper describes a topic tracking system based on kernel principal component analysis(KPCA) to resolve the topic drift problem.The algorithm first computes a weighted matrix using topic prior knowledge in the development set.The KPCA based algorithm is then used for each topic sample to compensate for drift and to enhance the robustness of the sample features.Finally,the K-nearest neighbor(KNN) and Rocchio methods are used as classifiers to track each topic sample.Tests using the Fisher English transcript corpus show that this system reduces the detection cost by 15%-18% compared with the baseline system.
作者 刘权 郭武
出处 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 2013年第6期865-868,共4页 Journal of Tsinghua University(Science and Technology)
关键词 话题跟踪 核主成分分析 话题偏移 特征提取 topic tracking kernel principal component analysis topic drift feature extraction
  • 相关文献

参考文献14

  • 1洪宇,张宇,刘挺,李生.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007,21(6):71-87. 被引量:153
  • 2Allan J, Carbonell J G, Doddington G, et al. Topic detection and tracking pilot study: Final report [C]// Proc Broadcast News Transcription and Understanding Workshop. San Francisco, CA, USA: Morgan Kaufmann, 1998:194-218.
  • 3Watanabe Y, Okada Y, Kaneji K, et al. Multimedia database system for TV newscasts and newspapers [C]// Advanced Multimedia Content Processing. Berlin, Germany: Springer, 1999: 208-220.
  • 4Raghavan V V, Wong S K M. A critical analysis of vector space model for information retrieval [J]. Journal of the American Society for Information Science and Technology, 1986, 37(5): 279-287.
  • 5Blei D, Ng A, Jordan M. Latent Dirichlet allocation [J]. The Journal of Machine Learning Research, 2003, 3: 993 - 1022.
  • 6Lee C, Lee G G, Jang M. Dependency structure language model for topic detection and tracking [J]. InJormation Processing and Management, 2007, 43(5) : 1249 - 1259.
  • 7Carbonell J, Yang Y, Lafferty J, et al. CMU report on TDT-2: Segmentation, detection and tracking [C]// Proc DARPA Broadcast News Workshop. San Francisco, CA, USA: Morgan Kaufmann, 1999: 117-120.
  • 8Schultz J, Liberman M. Topic detection and tracking using IDF-weighted cosine coefficient [C]// Proc DARPA Broadcast News Workshop. San Francisco, CA, USA: Morgan Kaufmann, 1999:189 - 192.
  • 9Allan J, Papka R, Lavrenko V. On-line new event detection and tracking [C]// Proc 21st Annual International ACM SIGtR Conference on Research and Development in Information Retrieval. New York, NJ, USA: ACM Press, 1998:37 - 45.
  • 10Salton G, Wong A, Yang C S. A vector space model for automatic indexing [J]. Communications of the ACM, 1975, 18(11) : 613 -620.

二级参考文献68

共引文献152

同被引文献36

引证文献3

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部