期刊文献+

一种基于反例样本修剪支持向量机的事件追踪算法 被引量:1

Event Tracking Algorithm Based on Negative-Example-Pruning Support Vector Machine
下载PDF
导出
摘要 支持向量机(SVM)在各类别样本数目分布不均匀时,样本数量越多其分类误差越小,而样本数量越少其分类误差越大.在分析这种倾向产生原因的基础上,提出了一种基于反例样本修剪支持向量机(NEP-SVM)的事件追踪算法.该算法首先修剪反例样本,根据距离和类标决定一反例样本的取舍,然后使用SVM对新的样本集进行训练以得到分类器,补偿了上述倾向性问题造成的不利影响.另外,由于后验概率对于提高事件追踪的性能至关重要,而传统的支持向量机不提供后验概率,本文通过一个sigmoid函数的参数训练将SVM的输出结果映射成概率.实验结果表明NEP-SVM是有效的. When training sets with uneven class sizes are used, the larger the sample size, the smaller the classification error of support vector machine (SVM), whereas the smaller the sample size, the larger the classification error. A negative-examplespruning support vector machine (NEP-SVM) based algorithm for event tracking was proposed based on the analysis of the cause of this bias. The algorithm first pruned the negative examples, reserved and deleted a negative sample according to distance and its class label, then trained the new set with SVM to obtain a classifier and this algorithm compensates for the unfavorable impact caused by this bias. In addition, since posteriori probability of samples was important in improving the performance of event tracking, but traditional SVM did not provide posteriori probability, so the parameters of a sigmoid function were trained to map the SVM outputs into probabilities in this paper. Experimental results showed that the NEP-SVM is effective.
出处 《小型微型计算机系统》 CSCD 北大核心 2006年第8期1472-1477,共6页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(60473117)资助 国家"八六三"高技术研究发展计划基金项目(2001AA115123)资助.
关键词 事件追踪 支持向量机 主题提取 后验概率 event tracking support vector machine subject extraction posteriori probability
  • 相关文献

参考文献12

  • 1Allan J, Papka R, Lavrenko V. On-line new event detection and tracking[A]. In:Proceedings of the 21^st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C]. Melbourne, Australia, 1998(1): 37-45.
  • 2Papka R. On-line new event detection, clustering, and tracking[D]. University of Massachusetts at Amherst, 1999.
  • 3Kim K, Jung K, Park S et al. Support vector machines for texture classification[J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2002,24(11):1542-1550.
  • 4Lei Z, Wu L D, Lao S Y. A method for content-based news story classification in data mining[A]. In: Proceedings of the 11th ISPE International Conference on Concurrent Engineering[C]. 2004(1): 265-270.
  • 5尹中航,王永成,蔡巍,韩客松.利用串匹配技术实现网上新闻的主题提取(英文)[J].软件学报,2002,13(2):159-167. 被引量:11
  • 6Allan J, Carbonell J, Doddington G, et al. Topic detection and tracking pilot study final report[A]. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop[C]. Morgan Kaufmann Publishers, Inc., 1998,194-218.
  • 7Yang Y, Carbonell J, Brown R, et al. Learning approaches for detecting and tracking news events[J]. IEEE Intelligent Systems: Special Issue on Application of Intelligent Information Retrieval, 1999,14(4):32-43.
  • 8Juha M, Helena A M, Marko S. Applying semantic classes in event detection and tracking[A]. In: Proceedings of International Conference on Natural Language Processing [C]. Mumbai,India, 2002, 175-183.
  • 9刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429. 被引量:198
  • 10Chen G L, Wang Y C. The research on automatic abstract of Internet information [J]. High Technology Letters, 1999, 11 (2):33-36.

二级参考文献31

  • 1H Y Tan. Chinese place automatic recognition research. In: C N Huang, Z D Dong, eds. Proc of Computational Language.Beijing: Tsinghua University Press, 1999
  • 2Zhang Huaping, Liu Qun, Zhang Hao, et al. Automatic recognition of Chinese unknown words recognition. First SIGHAN Workshop Attached with the 19th COLING, Taipei, 2002
  • 3S R Ye, T S Chua, J M Liu. An agent-based approach to Chinese named entity recognition. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002
  • 4J Sun, J F Gao, L Zhang, et al. Chinese named entity identification using class-based language model. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002
  • 5Lawrence R Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc of IEEE, 1989,77(2): 257~286
  • 6Shai Fine, Yoram Singer, Naftali Tishby. The hierarchical hidden Markov model: Analysis and applications. Machine Learning,1998, 32(1): 41~62
  • 7Richard Sproat, Thomas Emerson. The first international Chinese word segmentation bakeoff. The First SIGHAN Workshop Attached with the ACL2003, Sapporo, Japan, 2003. 133~143
  • 8J Hockenmaier, C Brew. Error-driven learning of Chinese word segmentation. In: J Guo, K T Lua, J Xu, eds. The 12th Pacific Conf on Language and Information, Singapore, 1998
  • 9Andi Wu, Zixin Jiang. Word segmentation in sentence analysis.1998 Int'l Conf on Chinese Information Processing, Beijing, 1998
  • 10D Palmer. A trainable rule-based algorithm for word segmentation. The 35th Annual Meeting of the Association for Computational Linguistics (ACL'97), Madrid, 1997

共引文献207

同被引文献6

  • 1K Mckeown,J Robin,K Kukich.Generating concise natural language summaries[J].Information Processing & Management.1995,31(5):703-733.
  • 2F Tomohiro,T Hideaki & N Toyoaki.Multi-text summarization for collective knowledge formation[M].In Toyoaki Nishida,editor,Dynamic Knowledge Interaction,chapter 7,CRC press,2000,223-246.
  • 3G Salton,A Singhal,M Mitra & C Buckley.Automatic text structuring and summarization[J].Information Processing & Management,1997,33(2):193-207.
  • 4J Goldstein,M Kantrowitz,V Mittal & J Carbonell.Summarizing text documents:sentence selection and evaluation metrics[C].In Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval,Berkeley,California,United States,August,1999.121-128.
  • 5Lei Zhen,Wu Lingda,Zhang Ying,Liu Yu-chi.A System for Detecting and Tracking Internet News Event[C].Proceedings of the 6th Pacific Rim Conference on Multimedia,Jeju Island,Korea,November,2005.754-764.
  • 6郭燕慧,钟义信,马志勇,姚均勇.自动文摘综述[J].情报学报,2002,21(5):582-591. 被引量:24

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部