期刊文献+

一种可靠信任推荐文本分类特征权重算法 被引量:6

Reliable trust recommendation model for feature weighting in text categorization
下载PDF
导出
摘要 从可信计算角度,提出一种可靠信任推荐文本分类特征权重算法,分析了特征在文档中的特性,基于Beta分布函数研究了特征与文档类之间的信任关系,建立特征权重计算模型,并实现简单高效的线性文本分类器。在比较实验中采用20newsgroup和复旦中文语料集。与TFIDF算法进行性能比较,实验结果显示该算法性能较TFIDF显著提高,并对非平衡语料具有良好的适应性。 By reliable trust recommendation, used a feature weighting approach to construct the simplest linear weighting classifier in the procedure of which characteristics of feature were explored, while the trust relationship between features and categories was developed based on Beta distribution function. Experiments with 20newsgroup and Fudan Chinese evaluation data collection reported shows that this new algorithm generally outperformed TFIDF, and has good adaptability to non-equilibrium corpus.
出处 《计算机应用研究》 CSCD 北大核心 2010年第2期472-474,共3页 Application Research of Computers
基金 国家自然科学基金资助项目(60703071) 安徽省高校省级自然科学研究重点项目(KJ2009A63)
关键词 文本分类 特征权重 可信计算 概率确定性密度 自然语言处理 text categorization (TC) feature weighting trust computing probability certainty density natural language processing
  • 相关文献

参考文献8

  • 1YANG Yi-ming, LIU X. A re-examination of text categorization methods[ C]//Proc of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1999: 42- 49.
  • 2JOACHIMS T. A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization[ C]//Proc of the 14th International Conference on Machine Learning. 1997:143-151.
  • 3JOSANG A, KNAPSKOG S J. A metric for trusted systems[ C]// Proc of the 21st National Security Conference. 1998:16-29.
  • 4JOSANG A, ISMAIL R. The beta reputation system[ C]//Proc of the 15th Bled Conference on Electronic Commerce. Bled, Slovenia: [ s. n. ], 2002.
  • 5林鸿飞,杨志豪,赵晶.基于内容和合作模式的信息推荐机制[J].中文信息学报,2005,19(1):48-55. 被引量:14
  • 6YANG Yi-ming, PEDERSEN J O. A comparative study on feature selection in text categorization [ C ]//Proc of the 14th International Conference on Machine Learning. 1997:412-420.
  • 7朱靖波,王会珍,张希娟.面向文本分类的混淆类判别技术[J].软件学报,2008,19(3):630-639. 被引量:9
  • 8YANG Yi-ming. An evaluation of statistical approaches to text categorization [ J ]. Information Retrieval, 1999,1 ( 1- 2) :76- 82.

二级参考文献22

  • 1Voorhees E. ,Harrman D., Overview of the Seventh Text Retrieval Conference[A].In Proceedings seventh Text Retrieval Conference[C].1999,1 - 24,NST Press.
  • 2Segal R.,Kephart J., Mailcat: An Intenigent Assistant for Organizing e-Mail[A], In Proceedings of the Third International Conferenee on Autonomous Agents[C].1999,276- 282,ACM Press.
  • 3Oard D., Marchionini G., A Conceptual Framework for text faltering[A]. Http://www. cs. umd. edu/TRs/authors/Gary-Marchionini. html ,February,24,1997.
  • 4Sarwar. B, Katypis. G. ,Konstan, J.,et al, Item-based collaborative filtering reconmmendation algorithms[A],In:proceedings of the lOth International world Wide Web Conferenee[C] ,2001,285- 295.
  • 5Konstan J. ,Miller B., Maltz D. ,et al, GroupLen: Collaborative Filtering for Usenet News[A].Communications of the ACM[C].1997,33(3) :77 - 87.
  • 6Tol JT, Gonzalez RC. Pattern Recognition Principles. Addison-Wesley Publishing Company, 1974.
  • 7Chen WL. Research on text feature learning for text categorization [Ph.D. Thesis]. Shenyang: Northeastern University, 2005.
  • 8McCallum A, Kachites A. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. 1996. http://www.cs.cmu.edu/~mccallum/bow
  • 9Sebastiani F. Machine learning in automated text categorization. ACM Computing Surveys, 2002,34(1): 1-47.
  • 10Lewis D, Schapire R, Callan J, Papka R. Training algorithms for linear text classifiers. In: Proc. of the ACM SIGIR. 1996. 298-306. http://ciir.cs.umass.edu/pubfiles/callansigir96b.ps.gz

共引文献20

同被引文献50

  • 1罗欣,夏德麟,晏蒲柳.基于词频差异的特征选取及改进的TF-IDF公式[J].计算机应用,2005,25(9):2031-2033. 被引量:55
  • 2周立柱,林玲.聚焦爬虫技术研究综述[J].计算机应用,2005,25(9):1965-1969. 被引量:156
  • 3苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:389
  • 4尚文倩,黄厚宽,刘玉玲,林永民,瞿有利,董红斌.文本分类中基于基尼指数的特征选择算法研究[J].计算机研究与发展,2006,43(10):1688-1694. 被引量:38
  • 5初建崇,刘培玉,王卫玲.Web文档中词语权重计算方法的改进[J].计算机工程与应用,2007,43(19):192-194. 被引量:14
  • 6姚天昉,娄德成.汉语情感词语义倾向判别的研究[C]//中国计算技术与语言问题研究-第七届中文信息处理国际会议论文集,武汉:2007.
  • 7Hatzivassiloglou V.McKeown K R.Predicting the Semantic Orientation of Adjeetives[C]//Proceextings of the 35th Annual Meeting of the ACL and the 8th Conference of the European Chapter of the ACL.Stroudsburg.PA,USA:Association for Computational Linguistics,1997:174-181.
  • 8Turney Peter.Thumbs Up or Thumbs Down Semantic Orientation Applied to Unsupervised Classification of Reviews[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.USA:Association for Computational Linguistics.2002:417-424.
  • 9ICTCLAS项目组..ICTCLAS汉语分词系统[EB/OL]..http://ictclas.org/news_ictclas__publish.html,,[2008-09-03]..
  • 10谭松波..中文情感挖掘语料库-ChnSentiCorp[EB/OL]..http://www.searchforum.org.cn/tansongbo/corpus-senti.htm,,[2010-06-29]..

引证文献6

二级引证文献50

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部