期刊文献+

基于词条数学期望的词条权重计算方法 被引量:1

A TERM WEIGHT CALCULATION METHOD BASED ON THE TERM MATHEMATICAL EXPECTION
下载PDF
导出
摘要 文本的形式化表示一直是文本挖掘的基础性问题,向量空间模型中的TFIDF计算方法是文本表示中一种效果较好的经典词条权重计算方法。在分析传统TFIDF计算方法存在问题的基础上,针对TFIDF方法中没有考虑包含词条的文档在各个类别的分布情况以及各个类别中所含的文档数的不同。提出了将词条的数学期望(TFIDF-E)作为一个文本因子来进行改进上述问题。实验结果表明,TFIDF-E计算方法表示的文本分类效果好于TFIDF,验证了TFIDF-E方法的有效性和可行性。 Text formal representation is always the fundamental issue in text mining.TFIDF(Term Frequency,Inverse Document Frequency) calculation method in eigenspace model is a classical term weight calculation approach in text representation with better effect.based on analysing the problems in traditional TFIDF method of calculation,in light to that in TFIDF method it does not consider the distribution situation of various categories including the document contains the terms and to that there is different document number in each category,this paper proposes that to adopt mathematical expectations of the term(TFIDF-E) as a text factor for improving the above.Experimental results show that the text categorisation effect represented by TFIDF-E algorithm is better than the old TFIDF,the effectiveness and feasibility of TFIDF-E algorithm has been validated.
出处 《计算机应用与软件》 CSCD 2011年第4期177-179,共3页 Computer Applications and Software
基金 安徽省教育厅自然科学重点项目(KJ2007A051)
关键词 文本分类 词条权重 区分度 数学期望 Text categorisation Term weight Differentiation Mathematical expectation
  • 相关文献

参考文献10

二级参考文献47

  • 1王建会,王洪伟,申展,胡运发.一种实用高效的文本分类算法[J].计算机研究与发展,2005,42(1):85-93. 被引量:20
  • 2李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量:96
  • 3王聃,贾云伟,林福严.人脸识别系统中的特征提取[J].微计算机信息,2005,21(07X):53-55. 被引量:18
  • 4黄昌宁 等.对自动分词的反思[A]..语言计算与基于内容的文本处理[C].北京:清华大学出版社,2003,7.26-38.
  • 5Yang Y,http://citeseernjneccom/yang97comparativehtml,1997年
  • 6Apte C, Damerau F J, and Weiss S M. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 1994, 12:233- 251.
  • 7Yang Yiming, and Pedersen J O. A comparative study on feature selection in text categorization. In- Proceedings of the 14^th International Conference on Machine Learning (ICML-97), 1997. 412 - 420.
  • 8Hwee Tou Ng, Wei Boon Goh, and Kok Leong Low. Feature selection, perceptron learning, and a usability case study for text categorization. In: Proceedings of the 20^th ACM International Conference on Research and Development in Information Retrieval (SIGIR-97), 1997. 67 - 73.
  • 9Schutze H, Hull D A, and Pedersen J O. A comparison of classifiers and document representations for the routing problem. In: Proceedings of the 18^th ACM International Conference on Research and Development in Information Retrieval (SIGIR-95). 1995. 229 - 237.
  • 10Li Y H, and Jain A K. Classification of text document. The Computer Journal, 1998, 41(8) :537 - 546.

共引文献883

同被引文献9

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部