期刊文献+

一种模仿人类的自动文本分类算法 被引量:5

An Automatic Algorithm of Text Categorization Imitating Human's
下载PDF
导出
摘要 An algorithm of text classification is given that imitates human's in this paper. On one hand, the algorithmenhances weight of theme when feature vector is processed, because of the assumption that the title of a document canproject its content. On the other hand,a weight parameter o vector is designed to simulate human's skimming andskipping behavior for calculating method of a document cluster center, and a weight of the feature that there are morepositive examples than negative ones is enhanced . The experiment shows that the algorithm greatly improves the per-formance of a text classification system. An algorithm of text classification is given that imitates human's in this paper. On one hand, the algorithm enhances weight of theme when feature vector is processed, because of the assumption that the title of a document can project its content. On the other hand, a weight parameter to vector is designed to simulate human's skimming and skipping behavior for calculating method of a document cluster center, and a weight of the feature that there are more positive examples than negative ones is enhanced . The experiment shows that the algorithm greatly improves the performance of a text classification system.
出处 《计算机科学》 CSCD 北大核心 2003年第3期44-45,53,共3页 Computer Science
关键词 自动文本分类算法 文本信息处理 文档分类 自然语言处理 INTERNET Text categorization, Corpus, Cluster center,Machine learning
  • 相关文献

参考文献3

二级参考文献41

共引文献61

同被引文献32

  • 1高洁,吉根林.文本分类技术研究[J].计算机应用研究,2004,21(7):28-30. 被引量:36
  • 2Kobayashi M,Malassis L,Samukawa H.Retrieval and ranking of documents from a database[M].U S Patent,2000-06
  • 3Andrea Rodriguez M,Max Egenhofer J.Determing Semantic Similarityamong Entity Classes from Different Ontologies[J].IEEE Transactions on Knowledge and Data Engineering,2003; (2):442~456
  • 4Yuhua Li,Zuhair Bandar A,David McLean.An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources.Ontologies[J].IEEE Transactions on Knowledge and Data Engineering,2003;(4 ):871~881
  • 5Lin D.An Information-Theoretic Definition of Similarity[C].In:Proc Int'l Conf Machine Learning(CIKM'98),1998
  • 6SimonHaykin 叶世伟 史忠植译.神经网络原理[M].北京:机械工业出版社,2004..
  • 7Jiawei Han,Micheline Kamber,范明,孟小峰,等.数据挖掘概念与技术[M].2005.157.
  • 8Ian H.Witten,Eibe Frank.数据挖掘实用机器学习技术[M].北京:机械工业出版社,2006
  • 9John Atkinson Abutridy, Chris Mellish, Stuart Aitken.Combining information extraction with genetic algorithms for text mining[J]. Iggg Computer Society, 2004, 19(3):22
  • 10Wang Chenchih, Chen Kuanchou, Hua Huimin.Associational approach of text data mining and its implications [J]. IEEE International Conference onNetworking, Sensing & Control, 2004,1:243

引证文献5

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部