期刊文献+

文本分类中特征预抽取方法研究 被引量:5

Research on Feature Preextraction Method in Text Classification
原文传递
导出
摘要 在文本分类中,特征抽取是一项很重要的工作,抽取到的特征项质量的好坏直接影响到分类的效果。在研究了文本分类中常用的文本特征词预抽取方法的基础上,提出了一种基于词性选择的特征预抽取方法,结合IG方法进行特征抽取。在分类实验中实验结果显示,这种基于词性的特征预抽取方法在分类过程中可以在不降低分类精度的同时可以减少特征维数和训练时间。 The featurer extraction isn important task in a text classification,the characteristics of items will take a direct impact on the quality of classification results.This paper show a feature preextration method based on part of speech when author have studied common feture preextration methods and the new method iscarried to experiment combined with IG feature extraction method.The classification experiment results show that the feature peextration method based on part of speech can reduce the feature dimension and training time on the condition of guarantee accuracy of classification.
出处 《情报科学》 CSSCI 北大核心 2011年第1期86-88,92,共4页 Information Science
基金 河北省教育厅自然科学研究计划项目(2007405) 张家口市科学技术研究与发展计划项目(0921045B) 河北北方学院自然科学青年基金项目(Q2010008)
关键词 文本分类 特征 抽取方法 text classification feature tration method
  • 相关文献

参考文献8

二级参考文献102

共引文献159

同被引文献71

  • 1李彦平,张佳骥.文本聚类中的降维技术研究[J].无线电工程,2005,35(6):51-53. 被引量:8
  • 2苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:386
  • 3胡燕,吴虎子,钟珞.中文文本分类中基于词性的特征提取方法研究[J].武汉理工大学学报,2007,29(4):132-135. 被引量:26
  • 4Song Y.I., Lin C.Y., Cao Y. et al. Question Utility: A Novel Static Ranking of Question Search[C]. In Proceedings d the 23rd AAAI Conference on Artificial/ntelligence. AAAI' 08, Chicago: AAAI,2008.
  • 5Cao Y., Lin C.Y., Yu Y., Hon H.W. Recommending Questions Using the MDL-based Tree Cut Model[C]. In Proceedings of the Conference on the World Wide Web. WWW'08, Beijing: Association for Computing Machinery, 2008.
  • 6Xue X., Jeon J., Croft W.B. Retrieval Models for Question and Answer Archives[C]. n Proceedings of the 31" ACM SIGIR Conference on Research and Development in Information Re- trieval SIGIR' 08, Singapore: Association for Computing Ma- chinery, 2008.
  • 7Huang Z., Thint, M. Qin Z. Question classification using head- words and their hypemyms[C]. In Proceedings of the Confer- ence on Empirical Methods in Natural Language Processing. EMNLP' 08, Honolulu: Association for Computational Lin- guistics, 2008.
  • 8Raya S. K., Singhb S. and Joshie B.P. A semantic approach for question classification using WordNet and Wikipedia[J]. Pat- tern Recognition Letters, 2010, 31(13): 1935-1943.
  • 9Moschitti A, Basili R. Complex linguistic features for text clas- sification: A comprehensive study[C]. In Proceedings of the 26th European Conference on Information Retrieval Re- search. Sunderland: Springer-Veflag, 2004.
  • 10Ponte J.M., Croft W.B. A language modeling approach to in- formation retrieval[C]. In Proceedings of the 21" ACM SIGIR Conference on Research and Development in Information Re- trieval, SIGIR' 98, Melbourne: ACM, 1998.

引证文献5

二级引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部