期刊文献+

基于主题扩展的领域问题分类方法 被引量:10

Domain Question Classification Method Based on Topic Expansion
下载PDF
导出
摘要 领域问题分类在问答系统中占有重要地位,但目前面向特定领域的研究较少。针对领域问题文本篇幅较短、数据稀疏的特点,提出基于主题扩展的领域问题分类方法。该方法主要包括特征选择和特征扩展2个部分。利用卡方统计量特征选择方法,将问题文本选择的特征词作为特征扩展的依据。通过潜在狄利克雷分配主题模型对外部知识库进行分析,得到对应的主题分布。为避免引入噪声主题,采用主题熵的方法得到优质主题。将优质主题下所覆盖的词扩充到问题文本中,最后利用支持向量机分类器对问题文本进行分类。实验结果表明,与传统TFIDF文本分类方法相比,该方法分类效果较好,可提高问答系统的性能。 Domain question classification plays a central role in Question and Answering (Q&A) systems. Lots of current research work on question classification focuses on open domains while few of them pays attention to special domains. The domain questions are always short and have the issue of data sparseness. Hence, this paper proposes a method for domain question classification based on topic expansion. This algorithm mainly consists of two components: feature selection and feature expansion. It first extracts feature words, which are the bases of feature expansion, from raw question text through feature selection method CHI. Then it uses Latent Dirichlet Allocation (LDA) topic model to analyze the universal dataset to obtain the topic distribution. To avoid noisy topics, this paper adopts topic entropy to obtain high quality topics. Finally, it expands question text using the words from high quality topics and classifies the expanded question text using Support Vector Machine (SVM). Experimental results show that the proposed method performs better than the traditional text classification method TFIDF and is helpful to improve the performance of Q&A systems.
作者 张青 吕钊
出处 《计算机工程》 CAS CSCD 北大核心 2016年第9期202-207,213,共7页 Computer Engineering
基金 上海市科学技术委员会科研计划基金资助项目(1451110700 14511106803) 上海张江国家自主创新示范区专项发展基金资助项目(201411-JA-B108-002)
关键词 领域问题分类 数据稀疏 特征选择 主题模型 优质主题 特征扩展 ] domain question classification data sparseness feature selection topic model high quality topic feature expansion
  • 相关文献

参考文献15

  • 1Rahman T A. Question Classification Using Statistical Approach: A Complete Review [J],Journal of Theore- tical and Applied Information Technology,2015,71 ( 3 ) : 386-395.
  • 2Roberts K, Kilicoglu H, Fiszman M, et al. Automatically Classifying Question Types for Consumer Health Questions [ C l//Proceedings of 2014 AMIA Annual Symposium. Washinton D. C., USA: American Medical Informatics Association ,2014 : 1018-1027.
  • 3Qu Bo,Cong Gao,Li Cuiping,et al. An Evaluation of Class- ification Models for Question Topic Categorization ~ J ~. Journal of the American Society for Information Science and Technology ,2012,63 (5) :889-903.
  • 4Loni B. A Survey of State-of-the-Art Methods on Question Classification ~ D 1 ~ Delft, the Netherlands : Delft University of Technology ,2011.
  • 5Zhang D, Lee W S. Question Classification Using Support Vector Machines [ C ~//Proceedings of Annual International ACM SIGIR Conference on Research & Development in Informaion Retrieval. New York, USA: ACM Press ,2003:939-947.
  • 6冶忠林,杨燕,贾真,尹红风.基于语义扩展的短问题分类[J].计算机应用,2015,35(3):792-796. 被引量:16
  • 7Phan X H,Nguyen C T,Le D T,et al. A Hidden Topic- based Framework Toward Building Applications with Short Web Documents [ J ]. IEEE Transactions on Knowledge and Data Engineering,2011,23 (7) :961-976.
  • 8Vo D T, Ock C Y. Learning to Classify Short Text from Scientific Documents Using Topic Models with Various Types of Knowledge [J]. Expert Systems with Applications, 2015,42(3) :1684-1698.
  • 9Yang Y,Pedersen J O. A Comparative Study on FeatureSelection in Text Categorization ~ C l//Proceedings of International Conference on Machine Learning. Nashville, USA:The Institute of Museum and Library Services, 1997: 412-420.
  • 10刘丽珍,宋瀚涛.文本分类中的特征选取[J].计算机工程,2004,30(4):14-15. 被引量:40

二级参考文献13

  • 1李峰,李芳.中文词语语义相似度计算——基于《知网》2000[J].中文信息学报,2007,21(3):99-105. 被引量:106
  • 2LEE K-S, OH J-H, HUANG J-X, et al. TREC-9 experiments at KAIST: QA, CLIR and batch filtering[C]//Proceedings of the 9th Text Retrieval Conference (TREC-9). Gaithersburg: NIST, 2000:303-316.
  • 3PASCA M A, HARABAGIU S M. High performance question/answering[J]. Research and Development in Information Retrieval, 2001,11(3):366-374.
  • 4PRAGER J, REDEV D, BROWN E, et al. The use of predictive annotation for question answering in TREC[C]//Proceedings of the 8th Text Retrieval Conference (TREC-8). Gaithersburg: NIST, 1999:107-111.
  • 5HACIOGLU K, WARD W. Question classification with support vector machines and error correcting codes[C]//Proceedings of the 2003 HLT-NAACL. Stroudsburg: Association for Computational Linguistics, 2003: 28-30.
  • 6ZHANG D, LEE W S. Question classification using support vector machines[C]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2003:26-32.
  • 7LI X, ROTH D. Learning question classifiers[C]//Proceedings of the 19th International Conference on Computational Linguistics (COLING). Stroudsburg: Association for Computational Linguistics, 2002:556-562.
  • 8METZLER D, CROFT W B. Analysis of statistical question classification for fact-based questions[J]. Journal of Information Retrieval, 2005,8(3):481-504.
  • 9NGUYEN M L, NGUYEN T T, SHIMAZU A. Subtree mining for question classification problem[C]//Proceedings of the 20th International Conference on Artificial Intelligence. Pittsburgh: Pennsylvania, 2007: 1695-1700.
  • 10BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003,3(1):993-1022.

共引文献54

同被引文献59

引证文献10

二级引证文献61

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部