期刊文献+

使用Logistic回归模型进行中文文本分类 被引量:10

Using Logistic regression model for Chinese text categorization
下载PDF
导出
摘要 使用Logistic回归模型进行中文文本分类,通过实验,比较和分析了不同的中文文本特征、不同的特征数目、不同文档集合的情况下,基于Logistic回归模型的分类器的性能。并将其与线性SVM文本分类器进行了比较,结果显示它的分类性能与线性SVM方法相当,表明这种方法应用于文本分类的有效性。 In this paper,Logistic regression model is used for Chinese text categorization.The categorization performance of this method is analyzed using different approaches for text feautre generation,different dimension of features and different documents set.Moreover,its classification performance is compared to linear SVM classifier in experiments.The experiments results show that its perfromance is comparable with linear SVM classifier.It's a promising method for text categorization.
出处 《计算机工程与应用》 CSCD 北大核心 2009年第14期152-154,共3页 Computer Engineering and Applications
基金 国家自然科学基金No.60772073 河北省自然科学基金No.F2006001020 河北省教育厅科研基金No.2005347 河北大学科研基金No.Y2004045~~
关键词 LOGISTIC回归模型 支持向量机 文本分类 特征 Logistic regression model support vector machines text categorization features
  • 相关文献

参考文献13

  • 1Rennie J D M,Shih L,Teevan J,et al.Tackling the poor assumptions of Naive Bayes text classifiers [C]//Proceedings of the Twentieth International Conference on Machine Learning,2003,2:616-623.
  • 2Chiang J H,Chen Y C.Hierarchical fuzzy-KNN networks for news documents categorization[C]//lOth IEEE International Conference on Fuzzy Systems,2001(2) :720-723.
  • 3Sebastiani F,Nazionale C,Valdambrini N.An improved boosting algorithm and its application to text categorization[C]//Proceedings of the Ninth International Conference on Information and Knowledge Management, 2000: 78-85.
  • 4Zhang Hao,Berg A C,Maire M,et al.SVM-KNN:Discriminative nearest neighbor classification for visual category recognition[C]// IEEE Computer Society Conference on HHComputer Vision and Pattern Recognition, 2006 : 2126-2136.
  • 5Yang Y.An evaluaton of statistical approaches to text categorization[J].Information Retrieval, 1999,1 ( 1 ) : 76-78.
  • 6邹娟,周经野,邓成.一种基于语义分析的中文特征值提取方法[J].计算机工程与应用,2005,41(36):164-166. 被引量:6
  • 7Komarek P,Moore A.Fast robust logistic regression for large sparse datasets with binary outputs[C]//Proceedings of the Ninth International Workshop on Artifical Intelligence and Statistics,2003:197-204.
  • 8Keerth S S,Duan K B,Shevade S K,et al.A fast dual algorithm for kernel logistic regression[J].Machine Learning,2005,61( 1 ) : 151-165.
  • 9Lin C J,Weng R C,Sathiya Keerthi S.Trust region Newton methods for large-scale logistic regression[C]//Proceedings of the 24th International Conference on Machine Learning,2007,3 : 561-568.
  • 10董振东.知网[EB/OL].http://www.keenage.com.

二级参考文献23

  • 1全昌勤,何婷婷,姬东鸿,刘辉.从搭配知识获取最优种子的词义消歧方法[J].中文信息学报,2005,19(1):30-35. 被引量:12
  • 2申红,吕宝粮,内山将夫,井佐原均.文本分类的特征提取方法比较与改进[J].计算机仿真,2006,23(3):222-224. 被引量:28
  • 3Apte C,Damerau F.Automated learning of decision rules for text categorization[J].ACM Transactional Information System,1994;12(3):233-251.
  • 4Andrew McCallum.Kamal Nigam.A comparison of event models for naive bayes text categorization[C].In..AAAI-98 Workshop on”I.earning for Text Categorization”,1998:101-107.
  • 5杨允言 谢清俊 陈淑美.中文文件自动分类之探讨[C]..台湾第 六届计算语言学研讨会论文集[C].,1993..
  • 6Ali Selamat.Sigeru Omatu.Web page feature selection and classification using neural neural networks[J].Information Sciences,2004;158:69-88.
  • 7YAROWSKY D. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods[ A]. In: Proceed Annual Meeting of ACL[ C].Cambridge, Massachusetts, USA, 1995.181 - 188.
  • 8MICHAEL L. Automatic sense diaambiguation: How to tell a pine cone from an ice cream[ A]. In Proceedings of the 1986 SIGDOC Conference[ C]. New York, Association for Computing Machinery,1986.24 - 26.
  • 9MANNING CD,SCHUTZE H.统计自然语言处理基础[M].苑春法,等译.北京:电子工业出版社,2005.
  • 10Constantine Kotropoulos,Athanasios Papaioannou.A novel updating scheme for probabilistic latent semantic indexing[C]//LNCS 3955:Lecture Notes in Artificial Intelligence:2006:137-147.

共引文献11

同被引文献89

引证文献10

二级引证文献59

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部