期刊文献+

基于领域相关词汇提取的特征选择方法 被引量:4

Feature Selection Method Based on Domain-specific Term Extraction
下载PDF
导出
摘要 传统文本分类中的文档表示方法一般基于全文本(Bag-Of-Words)的分析,由于忽略了领域相关的语义特征,无法很好地应用于面向特定领域的文本分类任务.本文提出了一种基于语料库对比领域相关词汇提取的特征选择方法,结合SVM分类器实现了适用于特定领域的文本分类系统,能轻松应用到各个领域.该系统在2005年文本检索会议(TREC,Text REtrieval Conference)的基因领域文本分类任务(Genomics Track Categorization Task)的评测中取得第一名. The traditional text representation methods for text classification are generally based on the analysis of full text (Bagof-Words). Because of ignoring domain-specific semantic features, they can not fit domain-specific text classification. This paper describes a feature selection method based on domain-specific term extraction using corpus comparison, and a text classification system based on the combination of this method and the SVM classifier, which can be applied to any domain easily. This text classification system got the highest score among runs from 19 groups in the evaluation of TREC 2005 Genomics Track Categorization Task.
作者 孙麟 牛军钰
出处 《小型微型计算机系统》 CSCD 北大核心 2007年第5期895-899,共5页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(60305006)资助
关键词 文本分类 文档表示 特征选择 领域相关 text classification document representation feature selection domain-specific
  • 相关文献

参考文献15

  • 1Ron Kohavi,George H John.Wrappers for feature subset selection[C].In:Artificial Intelligence,1997,97(1-2):273-324.
  • 2Avrim L Blum,Pat Langley.Selection of relevant features and examples in machine learning[C].In:AAAI Fall Symposium on Relevance,1994,140-144.
  • 3Yang Yi-ming,Jan O Pedersen.A comparative study on feature selection in text categorization[C].In:Proceedings of 14th International Conference on Machine Learning,1997,412-420.
  • 4Lewis D D,Ringuette M.Comparison of two learning algorithms for text categorization[C].In:Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval,1994.
  • 5Wiener E,Pedersen J O,Weigend A S.A neural network approach to topic spotting[C].In:Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval,1995,317-332.
  • 6Schutze H,Hull D A,Pedersen J O.A comparison of classifiers and document representations for the routing problem[C].In:18^th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval,1995,229-237.
  • 7Penas A,Verdejo F,Gonzalo J,et al.Corpus-based terminology extraction applied to information access[C].In:Proceedings of Corpus Linguistics,2001.
  • 8David Vogel.Using generic corpora to learn domain-specific terminology[C].In:Proceedings of The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2003.
  • 9Teresa Mihwa Chung.A corpus comparison approach for terminology extraction[J].Terminology,2003,(9):221-246.
  • 10Patrick Drouin.Detection of domain specific terminology using corpora comparison[C].In:Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC),Lisbon,Portugal,2004.

同被引文献33

引证文献4

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部