期刊文献+

基于基尼的模糊kNN分类器(英文)

Fuzzy kNN Text Classifier Based on Gini Index
下载PDF
导出
摘要 随着网络的发展,大量的文档涌现在网上,自动文本分类成为处理海量数据的关键技术。在众多的文本分类算法中,kNN算法被证明是最好的文本分类算法之一。对于大多数文本分类来说,文本预处理是文本分类的瓶颈,文本预处理的好坏直接影响着分类的性能。在此介绍了一种新的文本预处理算法——基于基尼的文本预处理算法。同时采用模糊集理论改进kNN的决策规则。这两者的结合使得模糊kNN比传统的kNN表现出更好的分类性能。实验结果证明这种改进是有效的,可行的。 With the development of Web ,large numbers of documents are available on Internet. Automatic text categorization becomes more and more important for dealing with massive data. In numerous text categorization algorithms,kNN algorithm is proved one of the best text categorization algorithms. But for kNN classifier and other classifiers,text preprocessing before categorization is a bottleneck. The results of text preprocessing directly affect the categorization performance. This paper present a new text preprocessing algorithm text preprocessing algorithm based on Gini index. At the same time ,this paper adopt the theory of fuzzy sets to improve the decision rule of kNN algorithm. The combination of these two methods makes the fuzzy kNN classifier show better categorization performance than classical kNN algorithm. Experiment results show that our algorithm is effective and feasible.
出处 《广西师范大学学报(自然科学版)》 CAS 北大核心 2006年第4期87-90,共4页 Journal of Guangxi Normal University:Natural Science Edition
基金 National Natural Science Foundation of China (60503017) Beijing Jiaotong University Science Foun-dation (2004RC008)
关键词 文本分类 KNN 模糊kNN 文本预处理 GINI INDEX text categorization kNN fuzzy kNN text preprocessing Gini index
  • 相关文献

参考文献2

二级参考文献28

  • 1[1]Langley P,Iba W,Thompson K.An analysis of bayesian classifiers[A].Proceedings tenth national conference on artificial intelligence[C].Menlo Park,CA:AAAI Press,1992.223-228.
  • 2[2]Friedman N,Geiger D,Goldszmidt M.Bayesian network classifiers[J].Machine Learning,1997,29:131-163.
  • 3[3]Pearl J.Probabilistic reasoning in intelligent systems:Networks of plausible inference[M].San Francisco:Morgan Kaufman Publishers,1988.122-150.
  • 4[4]Chickering D M.Learning bayesian networks is NP-complete[A].Horvitz Eric,Jensen Finn V.Proceedings of the 12th conference on uncertainty in artificial intelligence[C].San Francisco:Morgan Kaufmann Publishers,1996.210-216.
  • 5[5]Dumais S,Platt J,Heckerman D,et al.Inductive learning algorithms and representations for text categorization[A].Makki K,Bouganim L.Proceedings international conference on information and knowledge management[C].New York:ACM Press,1998.148-155.
  • 6[6]Yang Y.An evaluation of statistical approaches to text categorization[J].Journal of Information Retrieval,1999,1(1/2):67-88.
  • 7[7]Lam W,Ho C Y.Using a generalized instance set for automatic text categorization[A].Moffat Alistair,Wilkinson Ross.Proceedings of the 21th annual international ACM SIGIR conference on research and development in information retrieval[C].New York:ACM Press,1998.81-89.
  • 8[8]Han E H,Karypis G,Kumar V.Text categorization using weight adjusted k-nearest neighbor classification[A].Cheung D,Williams G J,Li Q.Proceedings of the 5th Pacific Area conference on knowledge discovery and data mining (PAKDD 2001).Lecture notes in artificial intelligence (LNAI)[C].Berlin:Springer,2001.53-65.
  • 9[9]Yang Y,Chute C G.An application of least squares fit mapping to text information retrieval[A].Korfhage Robert,Rasmussen Edie,Willett Peter.Proceedings of 16th annual international ACM SIGIR conference on research and development in information retrieval[C].New York:ACM Press,1993.281-290.
  • 10[10]Mccallum A,Nigam K.A comparison of event models for naive bayes text classification[DB/OL].http://citeseer.nj.nec.com/mccallum98comparison.html.1999.

共引文献97

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部