期刊文献+

一种文本多级分类方法研究

Study of Multi-level Text Classification Based on Combination of Rule and KNN Algorithm
下载PDF
导出
摘要 针对目前基于规则和基于统计的文本分类方法存在的不足,提出了一种新颖的基于规则和K-近邻分类相融合的文本分类方法。首先,对描述文本特征的传统向量空间模型进行了扩充,给出了具体的扩展模型。然后,基于扩展模型提出了一种规则的表示方法,并为每一条规则赋予了一个强弱系数,根据这个系数可以对识别的文本按级别排序。最后,通过设定一个阀值,将级别低于阀值的文本过滤掉。该方法可有效地排除被K-近邻分类误识别的那些文本,从而在一定程度上提高了分类的正确率。通过小数据集测试实验结果表明,该方法是有效的、可行的。 There were two methods of text classification, one was based on rules, another was on statistic.The two methods had merit and defect.Aiming at their respective shortcomings, an effective method of text classification was proposed that it included assembled KNN and rule method.The conventional VSM description was expanded in the text, and a detailed description of the extended VSM was shown.Based on it, an expression method of rules is presented.By assigning a coefficient it indicates the accuracy and sorting the results, the documents were filtered,the coefficients are less than that of the given threshold.Hence, the inaccuracy documents identified by KNN method were excluded, and the precision and the recall were improved in a certain extent.Experimental results show that the method is effective and feasible.
作者 肖红 刘淑华
出处 《长江大学学报(自科版)(上旬)》 CAS 2008年第2期92-95,共4页 JOURNAL OF YANGTZE UNIVERSITY (NATURAL SCIENCE EDITION) SCI & ENG
基金 黑龙江省自然科学基金资助项目(11521013)
关键词 文本分类 K-近邻分类算法 向量空间模型 text classification KNN algorithm vector space model
  • 相关文献

参考文献4

二级参考文献19

  • 1MitchellTM著 曾华军 张银奎译.机器学习[M].北京:机械工业出版社,2003..
  • 2Salton G,Lesk M E.Computer Evaluation of Index and Text Processing. Association for Computing Machinery,1968,15(1).
  • 3Maron M E. On Relevance,Probabilistic Indexing and Information Retrieval. Journal of the ACM,1960,7(3).
  • 4Lewis D D. Feature Selection and Feature Extraction for Text Categorization. In Proceedings of Speech and Natural Language Workshop. Defense Advanced Research Projects Agency,Morgan Kaufmann,1992-02:212-217.
  • 5Yang Yiming,Liu Xin. A Re-examination of Text Categorization Methods. Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR),1999:42-49.
  • 6Salton G, Wong A, Yang CS. On the specification of term values in automatic indexing[ J]. Journal of Documentation, 1973, 29 (4):351 - 372.
  • 7姚天顺 朱靖波.自然语言理解[M](第2版)[M].北京:清华大学出版社,2002..
  • 8李东 张湘辉.汉语分词在中文软件中的广泛应用[EB/OL].http://www.microsoft.com/ch,.
  • 9Belkin N J, Croft W B. Information filtering and information retrieval: two sides of the same coin[J]. Communications of ACM,1994,35(12):29-38.
  • 10Lam W, Ruiz M, Srinivasan P. Automatic text categorization and its application to text retrieval[J]. IEEE Trans on Knowledge and Data Engineering, 1999,11 (6): 865-879.

共引文献167

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部