期刊文献+

基于修正SVM-KNN组合算法的汉语专有名词自动抽取 被引量:2

Automatic Extraction on Chinese Proper Names Based on a Modified SVM-KNN Classifier
下载PDF
导出
摘要 专有名词的自动抽取是文本挖掘、信息检索和机器翻译等领域的关键技术。本文研究了组合SVM和KNN两种分类器进行汉语专有名词自动抽取的方法。对样本在空间的不同分布使用不同的分类方法,当测试样本与SVM最优超平面的距离大于给定的阈值时使用SVM分类,否则使用KNN;在实际训练语料中,常常是负类样本数远多于正类样本数,而传统KNN方法对不平衡训练集存在敏感性,所以提出了用归一化的思想对传统的KNN方法进行修正。实验表明,用SVM与修正的KNN组合算法进行汉语专有名词抽取比单一的SVM方法以及原始的SVM-KNN方法更具优越性,而且这种方法可以推广到其他非平衡分布样本的分类问题。 Extracting Chinese proper names is a key step in the fields of text mining,information retrieval and machine translation.This paper presents a method of extracting proper names from Chinese texts based on the fusion of support vector machine(SVM)and modified K nearest neighbors(KNN).Different classifiers are used for classifying the different test samples in spatial distributions.In the class phase,the algorithm computes the distance from the test sample to the hyperplane of SVM.If the distance is greater than the given threshold,the test sample would be classified on SVM; otherwise,the KNN algorithm will be used.In the practical training corpora,the negative class is represented by a large number of examples while the positive one is represented by only a few.To fit the unbalanced data,a normalized KNN classifier is proposed to modify classic KNN.The experimental results show that this model is more efficient than sole SVM and classic SVM-KNN in extracting Chinese proper names.The modified SVM-KNN model can be generalized to other fields of machine learning with unbalanced class distribution.
出处 《情报学报》 CSSCI 北大核心 2011年第6期610-617,共8页 Journal of the China Society for Scientific and Technical Information
基金 国家高技术研究发展计划(863计划)资助(No.2008AA04Z107)
关键词 KNN SVM 专有名词抽取 不平衡数据 SVM KNN extraction of proper names unbalanced data distribution
  • 相关文献

参考文献13

二级参考文献53

共引文献232

同被引文献24

  • 1张虎,郑家恒,刘江.语料库词性标注一致性检查方法研究[J].中文信息学报,2004,18(5):11-16. 被引量:9
  • 2Hendrickx I, Kim S N, Kozareva Z, et al. Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals[C]//Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions. [S. 1. ]. Association for Computational Linguistics, 2009:94-99.
  • 3Rink B, Harabagiu S. A generative model for unsupervised discovery of relations and argument classes from clinical texts [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.[S. 1. ] . Association for Compu- tational Linguistics, 2011:519-528.
  • 4Tratz S, Hovy E. ISI.- Automatic classification of relations between nominals using a maximum entropy classifier[C]//Pro- ceedings of the 5th International Workshop on Semantic Evaluation. [S. 1. ]: Association for Computational Linguistics, 2010:222-225.
  • 5Choi S P, Lee S, Jung H, et al. An intensive case study on kernel-based relation extraction[J]. Multimedia Tools and Appli- cations, 2013 : 1-27.
  • 6Punyakanok V, Roth D, Yih W. The importance of syntactic parsing and inference in semantic role labeling[J]. Computa- tional Linguistics, 2008,34 (2) : 257-287.
  • 7王保芳,张瑞强.关于对数线性模型在词性标注中的应用[J].计算机科学,2008,35(5):163-166. 被引量:1
  • 8闫志刚,杜培军.多类支持向量机推广性能分析[J].数据采集与处理,2009,24(4):469-475. 被引量:7
  • 9张中华,张端金.一种新的变步长LMS自适应滤波算法及性能分析[J].系统工程与电子技术,2009,31(9):2238-2241. 被引量:43
  • 10邓擘,郑彦宁,傅继彬.汉语实体关系模式的自动获取研究[J].计算机科学,2010,37(2):183-185. 被引量:3

引证文献2

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部