期刊文献+

Text Classification Using Sentential Frequent Itemsets

Text Classification Using Sentential Frequent Itemsets
原文传递
导出
摘要 Text classification techniques mostly rely on single term analysis of the document data set, while more concepts, especially the specific ones, are usually conveyed by set of terms. To achieve more accurate text classifier, more informative feature including frequent co-occurring words in the same sentence and their weights are particularly important in such scenarios. In this paper, we propose a novel approach using sentential frequent itemset, a concept comes from association rule mining, for text classification, which views a sentence rather than a document as a transaction, and uses a variable precision rough set based method to evaluate each sentential frequent itemset's contribution to the classification. Experiments over the Reuters and newsgroup corpus are carried out, which validate the practicability of the proposed system. Text classification techniques mostly rely on single term analysis of the document data set, while more concepts, especially the specific ones, are usually conveyed by set of terms. To achieve more accurate text classifier, more informative feature including frequent co-occurring words in the same sentence and their weights are particularly important in such scenarios. In this paper, we propose a novel approach using sentential frequent itemset, a concept comes from association rule mining, for text classification, which views a sentence rather than a document as a transaction, and uses a variable precision rough set based method to evaluate each sentential frequent itemset's contribution to the classification. Experiments over the Reuters and newsgroup corpus are carried out, which validate the practicability of the proposed system.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2007年第2期334-336,F0003,共4页 计算机科学技术学报(英文版)
关键词 text classification sentential frequent itemsets variable precision rough set model text classification, sentential frequent itemsets, variable precision rough set model
  • 相关文献

参考文献20

  • 1Li Wenmin, Jiawei Han, Pei Jian. CMAR: Accurate and efficient classification based on multiple class-association rules. In Proc. IEEE Int. Conf. Data Mining, Nick Cercone, T Y Lin,Xingdong Wu (eds.), San Jose, CA, USA, 2001, pp.369-376.
  • 2LiuB, Hsu W, iVla Y. Integrating classification and association rule mining. In Proc. ACM Int. Conf. Knowledge Discovery and Data Mining (SIGKDD'98), New York City, USA, August 1998, pp.80-86.
  • 3Antonie Maria-Luiza, Zaiane Osmar R. Text document categorization by term association. In Proc. IEEE Int. Conf. Data Mining (ICDM'2002), Maebashi City, Japan, 2002, pp.19-26.
  • 4Meretakis D, Fragoutids D, Lu H et al. Scalable associationbased text classification. In Proc. the 9th Int. Conf. Inforvnation and Knowledge Management, Arvin Agah, Jamie Callan,Elke Rundensteiner et al. (eds.), McLean, USA, 2000, pp.5-11.
  • 5Hull D A. Improving text retrieval for the routing problem using latent semantic indexing. In Proc. the 17th Annual Int.A CM-SIGIR Conf. Research and Development in Information Retrieval, W Bruce Croft, C J van Rijsbergen (eds.), Dublin,Ireland, 1994, pp.282-291.
  • 6Lewis D D. Naive (Bayes) at forty: The independence assumption in information retrieval. In Proc. the l Oth European Conf.Machine Learning, Claire N~dellec, Celine Rouveirol (eds.),Chemnitz, Germany, 1998, pp.4-15.
  • 7Joachims T. Text categorization with support vector machines:Learning with many relevant features. In Proc. 10th European Conf. Machine Learning, Claire Nedellec, Celine Rouveirol(eds.), Chemnitz, Germany, 1998, pp.137-142.
  • 8Cohen W, Hirsch H. Joins that generalize: Text classification using whirl. In Proc. 4th Int. Conf. Knowledge Discovery and Data Mining (SigKDD'98), New York City, USA, 1998,pp.169-173.
  • 9Cohen W, Singer Y. Context-sensitive learning methods for text categorization. A CM Trans. Information Systems, 1999,17(2): 146 173.
  • 10Yang Y. An evaluation of statistical approaches to text categorization. Technical Report CUM-CS-97-127, Carnegie Mellon University, April 1997.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部