一种基于反例样本修剪支持向量机的事件追踪算法被引量：1

Event Tracking Algorithm Based on Negative-Example-Pruning Support Vector Machine

下载PDF

导出

摘要支持向量机(SVM)在各类别样本数目分布不均匀时,样本数量越多其分类误差越小,而样本数量越少其分类误差越大.在分析这种倾向产生原因的基础上,提出了一种基于反例样本修剪支持向量机(NEP-SVM)的事件追踪算法.该算法首先修剪反例样本,根据距离和类标决定一反例样本的取舍,然后使用SVM对新的样本集进行训练以得到分类器,补偿了上述倾向性问题造成的不利影响.另外,由于后验概率对于提高事件追踪的性能至关重要,而传统的支持向量机不提供后验概率,本文通过一个sigmoid函数的参数训练将SVM的输出结果映射成概率.实验结果表明NEP-SVM是有效的. When training sets with uneven class sizes are used, the larger the sample size, the smaller the classification error of support vector machine （SVM）, whereas the smaller the sample size, the larger the classification error. A negative-examplespruning support vector machine （NEP-SVM） based algorithm for event tracking was proposed based on the analysis of the cause of this bias. The algorithm first pruned the negative examples, reserved and deleted a negative sample according to distance and its class label, then trained the new set with SVM to obtain a classifier and this algorithm compensates for the unfavorable impact caused by this bias. In addition, since posteriori probability of samples was important in improving the performance of event tracking, but traditional SVM did not provide posteriori probability, so the parameters of a sigmoid function were trained to map the SVM outputs into probabilities in this paper. Experimental results showed that the NEP-SVM is effective.

作者雷震谢毓湘吴玲达

机构地区国防科学技术大学信息系统与管理学院

出处《小型微型计算机系统》 CSCD 北大核心 2006年第8期1472-1477,共6页 Journal of Chinese Computer Systems

基金国家自然科学基金项目(60473117)资助国家"八六三"高技术研究发展计划基金项目(2001AA115123)资助.

关键词事件追踪支持向量机主题提取后验概率 event tracking support vector machine subject extraction posteriori probability

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献12

1Allan J, Papka R, Lavrenko V. On-line new event detection and tracking[A]. In:Proceedings of the 21^st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C]. Melbourne, Australia, 1998(1): 37-45.
2Papka R. On-line new event detection, clustering, and tracking[D]. University of Massachusetts at Amherst, 1999.
3Kim K, Jung K, Park S et al. Support vector machines for texture classification[J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2002,24(11):1542-1550.
4Lei Z, Wu L D, Lao S Y. A method for content-based news story classification in data mining[A]. In: Proceedings of the 11th ISPE International Conference on Concurrent Engineering[C]. 2004(1): 265-270.
5尹中航,王永成,蔡巍,韩客松.利用串匹配技术实现网上新闻的主题提取(英文)[J].软件学报,2002,13(2):159-167. 被引量：11
6Allan J, Carbonell J, Doddington G, et al. Topic detection and tracking pilot study final report[A]. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop[C]. Morgan Kaufmann Publishers, Inc., 1998,194-218.
7Yang Y, Carbonell J, Brown R, et al. Learning approaches for detecting and tracking news events[J]. IEEE Intelligent Systems: Special Issue on Application of Intelligent Information Retrieval, 1999,14(4):32-43.
8Juha M, Helena A M, Marko S. Applying semantic classes in event detection and tracking[A]. In: Proceedings of International Conference on Natural Language Processing [C]. Mumbai,India, 2002, 175-183.
9刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429. 被引量：198
10Chen G L, Wang Y C. The research on automatic abstract of Internet information [J]. High Technology Letters, 1999, 11 (2):33-36.

二级参考文献31

1H Y Tan. Chinese place automatic recognition research. In: C N Huang, Z D Dong, eds. Proc of Computational Language.Beijing: Tsinghua University Press, 1999
2Zhang Huaping, Liu Qun, Zhang Hao, et al. Automatic recognition of Chinese unknown words recognition. First SIGHAN Workshop Attached with the 19th COLING, Taipei, 2002
3S R Ye, T S Chua, J M Liu. An agent-based approach to Chinese named entity recognition. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002
4J Sun, J F Gao, L Zhang, et al. Chinese named entity identification using class-based language model. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002
5Lawrence R Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc of IEEE, 1989,77(2): 257～286
6Shai Fine, Yoram Singer, Naftali Tishby. The hierarchical hidden Markov model: Analysis and applications. Machine Learning,1998, 32(1): 41～62
7Richard Sproat, Thomas Emerson. The first international Chinese word segmentation bakeoff. The First SIGHAN Workshop Attached with the ACL2003, Sapporo, Japan, 2003. 133～143
8J Hockenmaier, C Brew. Error-driven learning of Chinese word segmentation. In: J Guo, K T Lua, J Xu, eds. The 12th Pacific Conf on Language and Information, Singapore, 1998
9Andi Wu, Zixin Jiang. Word segmentation in sentence analysis.1998 Int'l Conf on Chinese Information Processing, Beijing, 1998
10D Palmer. A trainable rule-based algorithm for word segmentation. The 35th Annual Meeting of the Association for Computational Linguistics (ACL'97), Madrid, 1997

共引文献207

1刘苗苗,李燕,王欣萌,甘琳琳,李虹.分级阅读初探:基于小学教材的汉语可读性公式研究[J].语言文字应用,2021(2):116-126. 被引量：9
2魏伟,郭崇慧,邢小宇.基于语义关联规则的试题知识点标注及试题推荐[J].数据分析与知识发现,2020,4(2):182-191. 被引量：8
3陈博逊,黄晶晓.一种基于HMM和CRF的双层分词模型[J].硅谷,2009,2(22).
4尹继豪,樊孝忠,刘士宁,于江德.一种基于Bootstrapping构建训练语料的方法[J].计算机研究与发展,2007,44(z2):394-397.
5于江德,谷川,葛文英,樊孝忠.一种基于字和子串联合标注的汉语分词方法[J].山西大学学报（自然科学版）,2011,34(3):357-362. 被引量：2
6于江德,周宏宇,余正涛.基于单个词语特征模板的汉语词性标注[J].山西大学学报（自然科学版）,2011,34(4):513-517. 被引量：1
7张雷生 ,万绍俊 ,许鹏文 .简单中文自动摘要系统研究[J].装备指挥技术学院学报,2004,15(3):105-109. 被引量：1
8陈炯,张永奎.一种基于词聚类的中文文本主题抽取方法[J].计算机应用,2005,25(4):754-756. 被引量：17
9李彦,贾爱军,占向辉,李翔龙.面向创新设计的多层次Web信息检索研究[J].工程设计学报,2005,12(3):129-133. 被引量：1
10任国锋,李德华,潘莹.一种改进的基尼指数特征权重算法[J].计算机与数字工程,2010,38(12):8-13. 被引量：1

同被引文献6

1K Mckeown,J Robin,K Kukich.Generating concise natural language summaries[J].Information Processing & Management.1995,31(5):703-733.
2F Tomohiro,T Hideaki & N Toyoaki.Multi-text summarization for collective knowledge formation[M].In Toyoaki Nishida,editor,Dynamic Knowledge Interaction,chapter 7,CRC press,2000,223-246.
3G Salton,A Singhal,M Mitra & C Buckley.Automatic text structuring and summarization[J].Information Processing & Management,1997,33(2):193-207.
4J Goldstein,M Kantrowitz,V Mittal & J Carbonell.Summarizing text documents:sentence selection and evaluation metrics[C].In Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval,Berkeley,California,United States,August,1999.121-128.
5Lei Zhen,Wu Lingda,Zhang Ying,Liu Yu-chi.A System for Detecting and Tracking Internet News Event[C].Proceedings of the 6th Pacific Rim Conference on Multimedia,Jeju Island,Korea,November,2005.754-764.
6郭燕慧,钟义信,马志勇,姚均勇.自动文摘综述[J].情报学报,2002,21(5):582-591. 被引量：24

引证文献1

1吴玲达,雷震,老松杨,雷永林.基于局部话题句群的事件相关多文档摘要研究[J].计算机仿真,2006,23(11):263-267. 被引量：2

二级引证文献2

1郭红建,黄兵.潜在语义分析聚类算法在文摘句子排序中的应用[J].计算机应用研究,2013,30(11):3299-3301. 被引量：3
2陆娜,周鹏程,武川.新闻文档实体重要性排序研究[J].图书情报工作,2018,62(11):97-102. 被引量：1

1高元洪,付亚炳.程序设计实现局域网信息安全共享[J].计算机与现代化,2012,0(8):57-59.
2刘晓平,卫兴武.并行程序性能检测及可视化[J].仪器仪表学报,2008,29(9):1831-1835. 被引量：5
3徐鲲,孙辉.Windows NT下对磁盘性能监测的研究[J].计算机科学,2012,39(S3):301-304. 被引量：3
4唐浩浩,席耀一,周杰,郭志刚,陈刚.基于维基知识的微博事件追踪方法[J].计算机应用与软件,2015,32(10):21-25. 被引量：1
5邓红平,宋婉娟.基于反例样本的原始凭证的手写数字识别[J].武汉理工大学学报,2008,30(3):154-156. 被引量：2
6雷震,吴玲达,王辰,贺玲.新闻事件分析系统IEventMiner的设计[J].计算机科学,2006,33(4):177-180.
7徐建民,孙晓磊,吴树芳.结合时间信息的事件追踪的动态模型[J].计算机应用,2013,33(10):2807-2810. 被引量：2
8知识天天学奖品日日抢！技嘉主板2010重大事件追踪在线活动纪实[J].微型计算机,2010(31):166-167.
9朱文琰,郑肖雄.基于正则表达式构建学习的网页信息抽取方法[J].计算机应用与软件,2017,34(2):14-19. 被引量：9
10张佳明,席耀一,王波,唐浩浩,李天彩.基于词向量的微博事件追踪方法[J].计算机工程与应用,2016,52(17):73-78. 被引量：11

小型微型计算机系统

2006年第8期

浏览历史

内容加载中请稍等...

一种基于反例样本修剪支持向量机的事件追踪算法被引量：1

参考文献12

二级参考文献31

共引文献207

同被引文献6

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

一种基于反例样本修剪支持向量机的事件追踪算法 被引量：1

参考文献12

二级参考文献31

共引文献207

同被引文献6

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

一种基于反例样本修剪支持向量机的事件追踪算法被引量：1