期刊文献+

不等式最大熵中的特征选择方法

Feature Selection Method for Inequality Maximum Entropy
下载PDF
导出
摘要 不等式最大熵模型较为成功地缓解了文本分类任务中的过拟合问题,但它使用的特征选择算法不能完全发挥不等式最大熵的最大优势。针对该问题提出采用改进的顺序前进式选择算法,提高文本分类任务中的识别率,试验结果证明该算法能够更准确地选出文本代表特征,对不等式最大熵模型的分类成绩有一定的改善。 Inequality maximum entropy method has alleviated data sparseness with flexible modeling capability more successfully than other probabilistic models in text classification tasks, but feature selection algorithm used by the model can not fully bring its advantage. This paper proposes a new feature selection method. It improves the recognition rate in text classification. Experimental result shows that this algorithm works more effectively in selecting representative features and improves the text classification performance a lot.
出处 《计算机工程》 CAS CSCD 北大核心 2009年第18期182-184,共3页 Computer Engineering
关键词 不等式最大熵 特征选择 文本分类 inequality maximum entropy feature selection text classification
  • 相关文献

参考文献12

  • 1Berger A L, Pietra S A D, Pietra V J D. A Maximum Entropy Approach to Natural Language Processing[J]. Computational Linguistic, 1996, 22(1 ): 39-71.
  • 2Ratnaparkhi A. Maximum Entropy Models for Natural Language Ambiguity Resolution[D]. Pennsylvania, USA: University of Pennsylvania, 1998.
  • 3Ramaparkhi A. A Maximum Entropy Model f0or Part-of-speech Tagging[C]//Proc. of the Conference on Empirical Methods in Natural Language Processing. Pennsylvania, USA: [s. n.], 1996: 133-142.
  • 4李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量:95
  • 5Nigam K, Lafferty L, McCallum A. Using Maximum Entropy for Text Classification[C]//Proc. of Workshop on Machine Learning for Information Filtering. Stockholm, Sweden: [s. n,], 1999:61-67.
  • 6Kazama J, Tsujii J. Maximum Entropy Models with Inequality Constraints: A Case Study on Text Categorization[J]. Machine Learning, 2005, 60(3): 159-194.
  • 7Chen S F, Rosenfeld R. A Gaussian Prior for Smoothing Maximum Entropy Models[R]. CMU, Tech. Rep.: CMU-CS-99-108, 1999.
  • 8Benson S J, Jorge J. A Limited Memory Variable Metric Method for Bound Constraint Minimization[R]. Argonne National Laboratory, Tech. Rep.: ANL/MCS-909- 0901,2001.
  • 9贾宁,张全.基于最大熵模型的中文姓名识别[J].计算机工程,2007,33(9):31-33. 被引量:5
  • 10秦进,陈笑蓉,汪维家,陆汝占.文本分类中的特征抽取[J].计算机应用,2003,23(2):45-46. 被引量:73

二级参考文献31

  • 1季姮,罗振声.基于统计和规则的中文姓名自动辨识[J].语言文字应用,2001(1):14-18. 被引量:13
  • 2D. D. Lewis. Naive (Bayes) at forty: The independence assumption in information retrieval. In: Proc. of the 10th European Conf. on Machine Learning. New York: Springer,1998, 4-15.
  • 3Y. Yang, X. Lin. A re-examination of text categorization methods. In: The 22nd Annual Int'l ACM SIGIR Conf. onResearch and Development in the Information Retrieval. NewYork: ACM Press, 1999.
  • 4Y. Yang, C. G. Chute. An example based mapping method for text categorization and retrieval. ACM Trans. on Information Systems, 1994, 12(3): 252 -277.
  • 5E. Wiener. A neural network approach to topic spotting. The 4th Annual Syrup. on Document Analysis and Information Retrieval,Las Vegas, NV, 1995.
  • 6R. E. Schapire, Y. Singer. Improved boosting algorithms using confidence-rated predications. In: Proc. of the 11th Annual Conf.on Computational Learning Theory. New York: ACM Press,1998. 80--91.
  • 7T. Joachims. Text categorization with support vector machines:Learning with many relevant features. In: Proc. of the 10th European Conf. on Machine Learning. New York: Springer,1998. 137-142.
  • 8Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrieval, 1999, 1 ( 1 ) : 76-- 88.
  • 9R. Adwait. Maximum entropy models for natural language ambiguity resolution: [ Ph. D. dissertation ] . Pennsylvania:University of Pennsylvania, 1998.
  • 10R. Adwait. A maximum entropy model for part-of-speech tagging. The Empirical Methods in Natural Language Processing Conference, Philadelphia, USA, 1996.

共引文献171

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部