期刊文献+

基于SVM的中文查询分类 被引量:2

Chinese Query Classification Based on SVM
下载PDF
导出
摘要 在问答系统中用户的查询是以自然语言问句的形式出现的,查询分类对生成合适的答案有着重要的指导性作用。现有文献大多基于SVM统计学习模型实现查询分类。文章详细分析了中文查询分类的典型特征及其编码过程,并给出了LibSVM分类器的参数优化及核函数选取方法。比较了词袋特征(bag-of-word)和词性与词袋绑定特征(bag-of-word/pos)在LibSVM(RBF)、LibSVM(Linear)和Liblinear三个分类器上的分类精度。实验结果表明,在问题训练集规模较大、特征维数较高的情况下,Liblinear分类器具有更好的性能。同时,得出一个结论:bagof-word/pos特征对英文查询分类有一定的贡献;对于中文查询分类,虽然理论上增加特征有利于提高SVM分类器的精度,但由于绑定词性特征后可能会引入噪声,进而降低查询分类的精度。 In question answering system,user query is in the form of natural language sentence,query classification (QC) plays an important role in generating appropriate answers.In existing literatures,many implement QC based on machine learning method such as SVM.In this paper we analyze the typical features and their coding of QC.The method of parameter optimization and kernel function selecting in SVM is also proposed.With the bag-of-word(/pos) features,we compare the accuracies of Chinese query classification on LibSVM(RBF),LibSVM(Linear) and Liblinear respectively. Experimental results show that the Liblinear Classifier has a better performance in the case of larger scale training set and higher dimension features.In the meantime,we also draw a conclusion that the bag-of-word/pos feature has certain contribution to English query classification.For Chinese query classification,the increase of features will help to improve the accuracy of SVM classifier theoretically,but in our experiment,binding the part of speech feature introduces the noise which reduces the query classification accuracy.
出处 《情报学报》 CSSCI 北大核心 2011年第9期946-950,共5页 Journal of the China Society for Scientific and Technical Information
基金 计算机软件新技术国家重点实验室开放课题基金项目(KFKT2010B02) 安徽省高校省级自然科学研究项目(KJ2007B245) 安徽省高校省级自然科学研究重点项目(KJ2011A48)
关键词 问答系统 查询分类 SVM 核函数 question answering system query classification SVM kernel function
  • 相关文献

参考文献13

  • 1郑实福,刘挺,秦兵,李生.自动问答综述[J].中文信息学报,2002,16(6):46-52. 被引量:165
  • 2Zhang D, Lee W S. Question classification using support vector machines [ C ]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( SIGER ' 03 ) , 2003 : 26-32.
  • 3Li X, Roth D. Learning question classifiers [ C ]//Proce- edings of the 19th International Conference on Computati- onal Linguistics (COLING02) ,2002 : 556-562.
  • 4Kocik K. Question classification using maximum entropy models [ D ]. Honours thesis, University of Sydney,2004.
  • 5张学工.关于统计学习理论与支持向量机[J].自动化学报,2000,26(1):32-42. 被引量:2264
  • 6Chang C C, Lin C J. LIBSVM : a library for support vector machines,2001. Software [ OL ]. [ 2010-03-21 ]. http :// www. csie. ntu. edu. tw/cjlin/libsvm.
  • 7Hsu C W, Chang C C, Lin C J. A Practical Guide to Support Vector Classification [ DB/OL ]. [ 2010-03-21 ]. http://www, csie. ntu. edu. tw/- cjlin/papers/guide/ guide, pdf.
  • 8Fan R E,Chang K W,Hsieh C J,et al. LIBLINEAR: A library for large linear classification [ J ]. Journal of Machine Learning Research ,2008,9 : 1871-1874.
  • 9Skowron M, Araki K. Effectiveness of combined features for machine learning based question classification [ J ]. Special Issue of the Journal of the Natural Language Processing Society of Japan on Question Answering and Automatic Summarization, 2005,12 ( 6 ) : 63-83.
  • 10余正涛,樊孝忠,郭剑毅.基于支持向量机的汉语问句分类[J].华南理工大学学报(自然科学版),2005,33(9):25-29. 被引量:20

二级参考文献25

  • 1[8]Ulf Hermjakob. Parsing and Question Classification for Question Answering. Proceeding of the workshop on Open-Domain Question Answering at ACL-2001
  • 2[9]Eugene Agichtein, Steve Lawrence, Luis Gravano. Learning Search Engine Specific Query Transformations for Question Answering. ACM 2001,169- 178
  • 3[10]Soo-Min Kim, ae-Ho Baek, Sang-Beom Kim, Hae-Chang Rim Question Answering Considering Semantic Categories and Co-occurrence Density. Proceedings of the night Text Retrieval Conference (TREC-9)
  • 4[11]Marius Pasca, Sanda Harabagiu. High-Performance Question/Answering. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( Sigir-01 ). New Orleans, LA. September 9 - 13,2001
  • 5[1]Ittycheriah,M. Franz,W-J Zhu,A. Ratnaparkhi. IBM's Statistical Question Answering System. Proceedings of the night Text Retrieval Conference (TREC-9)
  • 6[2]D. Elworthy. Question Answering Using a Large NLP System. Proceedings of the night Text Retrieval Conference (TREC-9)
  • 7[3]L. Wu,X-j Huang,Y. Guo,B. Liu,Y. Zhang. FDU at TREC-9:CLIR,Filtering and QA Tasks. Proceedings of the night Text Retrieval Conference(TREC-9)
  • 8[4]R.J. Cooper, S. M. Rüger. A Simple Question Answering System. Proceedings of the night Text Retrieval Conference(TREC-9)
  • 9[5]C.L.A. Clarke, G. V. Cormack, D. I. E. Kisman, T. R. Lynam. Question Answering by Passage Selection. Proceedings of the night Text Retrieval Conference (TREC-9)
  • 10[6]S-M Kim,D-H Baek,S-B Kim,H-C Rim. Question Answering Considering Semantic Categories and CoOccurrence Density. Proceedings of the night Text Retrieval Conference(TREC-9)

共引文献2437

同被引文献19

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部