期刊文献+

基于图模型的中文文档分类研究 被引量:3

Research on Chinese Document Classification Based on Graph Model
下载PDF
导出
摘要 信息处理领域中,现有的各种文本分类算法大都基于向量空间模型,而向量空间模型却不能够有效地表达文档的结构信息,从而使得它还不能充分地表达文档的语义信息.为了更有效地表达文档的语义信息,本文首先提出了一种新的文档表示模型—图模型,即通过带权标号图表达文档的特征词条及其位置关联信息,在此基础上本文继而提出了一种新的文档相似性度量标准,并用于中文文本的分类.实验结果表明,基于图模型的这种文档表示方式是有效的和可行的. Based on the limitations of vector space model, this paper conclude that vector space model is incapable of expressing the structure of documents effectively. To solve this problem, this paper put forward a new document representation using graph model, which can express the structure of documents more effectively, using the feature words and position relation information of documents. And a new similarity measure criterion is defined in this paper. Also, this paper put forward a document classification algorithm based on this graph model and apply it to the Chinese documents classification. Empirical results show the new graph model is feasible.
出处 《小型微型计算机系统》 CSCD 北大核心 2006年第4期754-757,共4页 Journal of Chinese Computer Systems
基金 福建自然基金项目(A0410010)资助 福建省科技三项项目(K03012)资助 福建省教育厅项目(JA04155)资助 福州大学科技发展基金项目(2003-XQ-23)资助
关键词 文本分类 图模型 相似性 向量空间模型 text classification graph model similarity measure vector space model
  • 相关文献

参考文献17

  • 1王建会,申展,胡运发.一种实用高效的聚类算法[J].软件学报,2004,15(5):697-705. 被引量:26
  • 2Schapire R E,Singer Y.Improved boosting algorithms using confidence-rated predications[C].In:Proc of the 11th Annual Conf on Computational Learning Theory,Madison:ACM Press,1998,80-91.
  • 3Lewis D D.Naive (Bayers) at forty:the independence assumption in information retrieval[C].In:The 10th European Conf on Machine Learning (ECML98),New York:Springer-Verlag,1998,4-15.
  • 4Wiener E.A neural network approach to topic spotting[C].The 4th Annual Symp on Document Analysis and Information Retrieval(SDAIR 95),Las Vegas,NV:University of Nevade,Las Vegas,1995,317-332.
  • 5Yang Y,Chute C G.An example-based mapping method for text categorization and retrieval[J].ACM Trans on Information Systems,1994,12(3):252-277.
  • 6Yang Y,Lin X.A re-examination of text categorization methods[C].In:The 22nd Annual Int′l ACM SIGIR Conf on Research and Development in Information Retrieval,New York:ACM Press,1999,42-49.
  • 7Joachims T.Text categorization with support vector machines:learning with many relevant features[C].In:The 10th European Conf on Machine Learning (EMCL-98).Berlin:Springer,1998.137-142.
  • 8Salton G,Wong A,Yang C S.A vector space model for automatic indexing[J].Communication of the ACM,1975,(18):618-620.
  • 9Paul S Jacobs.Text-based intelligent systems:current research and practice in information,retrieval and extraction[M].Hillsdale,NJ,Lawrence Erlbaum Associates,1992,127-149.
  • 10Hayes R M.Mathematical model in information retrieval[A].In P.L.Garvin,editor.Natural language and the computer[M].1963.

二级参考文献2

共引文献25

同被引文献45

引证文献3

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部