期刊文献+

快速的文本倾向性分类方法(英文) 被引量:2

A Rapid Method for Text Tendency Classification
下载PDF
导出
摘要 提出了一种快速的文本倾向性分类方法,即采用类别空间模型描述词语对类别的倾向性,基于词的统计特征实现分类;针对倾向性分类的复杂性,在综合考虑词频、词的文本频、词的分布三种统计特征的基础上,提出一种新的二次特征提取方法:第一次特征提取,采用组合特征提取方法,除去低频词以及在各类中均匀分布的噪音词;第二次特征提取,去除类别倾向性不明显的词。实验表明该分类方法不仅具有较高的分类性能,而且运行速度快,在信息检索、信息过滤、内容安全管理等方面具有一定的实用价值。 A rapid method for text tendency classification is proposed in this paper. By means of class space model to display the tendency of the words to the categories, the method realizes the classification based on the statistic characteristics of words. In this method, through the studies of the complexity of text tendency categorization, three statistic characteristics of word such as frequency, document frequency and the distribution of words are comprehensively taken into account, and a new method of twice feature selection is proposed: In the first characteristic selection process, using combination characteristic selection method, the words that those distributions are uniform in each category and the low-frequency words are deleted. Then in the second process, the words that those category tendencies are not obvious are deleted. The experimental results show that the algorithm is running-fast, and has high performance.
出处 《电子科技大学学报》 EI CAS CSCD 北大核心 2007年第6期1232-1236,共5页 Journal of University of Electronic Science and Technology of China
基金 国家863计划项目(2005AA147030)~~
关键词 类别权重 类别空间模型 文本倾向性分类 二次特征提取 category weight class space model text tendency categorization twice feature selection
  • 相关文献

参考文献7

二级参考文献54

共引文献395

同被引文献19

  • 1罗欣,夏德麟,晏蒲柳.基于词频差异的特征选取及改进的TF-IDF公式[J].计算机应用,2005,25(9):2031-2033. 被引量:55
  • 2张树良,冷伏海.基于文献的知识发现的应用进展研究[J].情报学报,2006,25(6):700-712. 被引量:47
  • 3马海兵,刘永丹,王兰成,李荣陆.三种文档语义倾向性识别方法的分析与比较[J].现代图书情报技术,2007(4):43-47. 被引量:15
  • 4Bo Pang, Lillian Lee, Shivakumar Vaithyanathan. Thumbs up Sentiment Classification Using Machine Learning Techniques [ C ]. In Proceedings of EMNLP 2002 : 79 - 86.
  • 5Vapnik V. The Nature of Statistical Beaming Theory[ M ]. New York; Sprfinger- Verlag, 1995.
  • 6LABAER J. Mining the literature and large datasets[J]. Nature Biotechnology, 2003, 21: 976-977.
  • 7LING X, MEI Q z, ZHAI C X, et al. Mining multi-faceted overviews of arbitrary topics in a text collection[C]//Proc 2008 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Las Vegas: ACM Press, 2008.
  • 8ZHOU D, JI X, ZHA H Y, et al. Topic evolution and social interactions: how authors effect research[C]//Proc 2006 ACM CIKM International Conference on Information and Knowledge Management. Arlington: ACM Press, 2006.
  • 9LESKOVEC J, LANG K J, DASGUPTA A, et al. Statistical properties of community structure in large social and information networks[C]//Proc 2008 International Conference on World Wide Web. Beijing: ACM Press, 2008.
  • 10BACKSTROM L, HUTTENLOCHER D, KLEINBERG J, et al. Group formation in large social networks: membership, growth, and evolution[C]//Proc 2006 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia: ACM Press, 2006.

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部