期刊文献+

基于增量式贝叶斯模型的中文问句分类研究 被引量:7

Chinese Question Classification Research Based on Incremental Bayes Model
下载PDF
导出
摘要 固定训练集生成的分类器性能不理想且不能跟踪用户需求,为此,提出一种将增量式贝叶斯思想用于问句分类的方法。采用遗传算法选取最优特征子集优化分类器,从而避免训练集特征过分冗余,使分类器在学习过程中动态地扩大训练集并修改分类器参数。在对问句进行分类时,提取问句的疑问词、句法结构、疑问意向词和疑问意向词在知网的首项义原作为分类特征。为了验证增量式贝叶斯方法的有效性,从语料库中随机抽取不同规模的问句构成增量集,基于不同的增量集对同一测试集中的问句进行分类。实验结果表明,增量式贝叶斯分类器较朴素贝叶斯分类器有更高的分类精度,大类和小类的准确率分别达到90.2%和76.3%,在提高准确率的同时优化了运行效率。 Since the performance of the classifier generated by the fixed training set is not satisfactory and can hardly track the users' needs dynamically,in this paper,the incremental Bayes idea is introduced in question classification. In order to eliminate the feature redundancy in the training set,Genetic Algorithm(GA)is used to select the optimal features to amend the classifier. In the process of classifier learning,the parameters are modified dynamically while the training set is expanded. The interrogative word,syntax structure,question focus words,and their first sememes are chosen as classification features. To verify the effectiveness of the proposed method,in the experiment,questions of different size at random are extracted from the corpus to build the incremental sets. Then classify the questions from the same test set based on different incremental sets. Experimental results show that the incremental Bayes classifier achieves better result.The classification accuracy of coarse classes and fine classes achieves90.2% and76.3% respectively. At the same time,it significantly optimizes the efficiency to some degree.
出处 《计算机工程》 CAS CSCD 2014年第9期238-242,共5页 Computer Engineering
基金 国家自然科学基金资助项目(61003311) 安徽高校省级自然科学基金资助项目(KJ2011A040)
关键词 问句分类 问答系统 增量式贝叶斯 朴素贝叶斯 改进贝叶斯 遗传算法 question classification question answering system incremental Bayes naive Bayes modified Bayes Genetic Algorithm(GA)
  • 相关文献

参考文献12

二级参考文献64

  • 1张宇,刘挺,文勖.基于改进贝叶斯模型的问题分类[J].中文信息学报,2005,19(2):100-105. 被引量:47
  • 2余正涛,樊孝忠,郭剑毅.基于支持向量机的汉语问句分类[J].华南理工大学学报(自然科学版),2005,33(9):25-29. 被引量:20
  • 3文勖,张宇,刘挺,马金山.基于句法结构分析的中文问题分类[J].中文信息学报,2006,20(2):33-39. 被引量:82
  • 4孙景广,蔡东风,吕德新,董燕举.基于知网的中文问题自动分类[J].中文信息学报,2007,21(1):90-95. 被引量:41
  • 5宫秀军 史忠植.基于贝叶斯潜在语义模型的半监督Web挖掘[J].软件学报,已录用,.
  • 6[8]Ulf Hermjakob. Parsing and Question Classification for Question Answering. Proceeding of the workshop on Open-Domain Question Answering at ACL-2001
  • 7[9]Eugene Agichtein, Steve Lawrence, Luis Gravano. Learning Search Engine Specific Query Transformations for Question Answering. ACM 2001,169- 178
  • 8[10]Soo-Min Kim, ae-Ho Baek, Sang-Beom Kim, Hae-Chang Rim Question Answering Considering Semantic Categories and Co-occurrence Density. Proceedings of the night Text Retrieval Conference (TREC-9)
  • 9[11]Marius Pasca, Sanda Harabagiu. High-Performance Question/Answering. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( Sigir-01 ). New Orleans, LA. September 9 - 13,2001
  • 10[1]Ittycheriah,M. Franz,W-J Zhu,A. Ratnaparkhi. IBM's Statistical Question Answering System. Proceedings of the night Text Retrieval Conference (TREC-9)

共引文献346

同被引文献94

引证文献7

二级引证文献40

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部