期刊文献+

一种基于质心与本体的文本分类方法 被引量:3

A Classification Method Based on Centroid and Ontology
下载PDF
导出
摘要 针对传统的TFIDF模型计算根集(root set)文档特征权重的不适应性,提出了计算文档特征权重的新方法--TFIDF-2模型.另外,给出3种启发式规则用于获取根集文档的质心向量.通过计算文档与质心之间的相似度进行文本分类只是质心的一个初步应用.在这个过程中,提出了一种计算文档与质心之间相似度的新方法.通过一系列的对比实验,分析验证了此种分类方法比传统的分类算法更准确、更高效.最后,验证了将本体与质心相结合提取未标识数据集中相关文档的有效性.
出处 《计算机研究与发展》 EI CSCD 北大核心 2007年第z2期6-11,共6页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60373099) 教育部"符号计算与知识工程"重点实验室基金项目(93K-17)
  • 相关文献

参考文献19

  • 1[1]S Chakrabarti.Mining the Web:Discovering Knowledge from Hypertext Data.San Francisco:Morgan Kaufmann,2003,
  • 2[2]T Joachims.SVMlightsupport vector machine.http://svmlight.joachims.org/,2004-02-09/2006-12-25
  • 3[3]B Liu,W S Lee,P Yu,et al.Partially supervised classification of text documents.In:Proc of the 19th Int'lConf on Machine Learning.San Francisco:Morgan Kaufmann,2002
  • 4[4]Y Yang,X Liu.A re-examination of text categorization methods.In:Proc of the 22nd Annual Int'lACM SIGIR Conf on Research Development in Information Retrieval.New York:ACM Press,1999.42-49
  • 5苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:381
  • 6[6]Maedche,Alexander.Ontology Learning for the Semantic Web.Boston:Kluwer Academic Publishers,2002.151-169
  • 7[7]S Chua,N Kulathuramaiyer.Semantic feature selection using wordNet.The IEEE/WIC/ACM Int'lConf on Web Intelligence(WI'04),Beijing,2004
  • 8[8]S Tan,X Cheng,B Wang,et al.Using dragpushing to refine centroid text classifiers.In:Ricardo A B Y,Z Nivio,M Gary,et al,eds.Proc of the ACM SIGIR-05.New York:ACM Press,2005.653-654
  • 9[9]V Lertnattee,T Theeramunkong.Effect of term distributions on centroid-based text categorization.Information Sciences,2004,158(1):89-115
  • 10[10]E Han,G Karypis.Centroid-based document classification:Analysis & experimental results.In:European Conf on Principles of Data Mining and Knowledge Discovery (PKDD).Berlin:Springer-Verlag,2000.424-431

二级参考文献3

共引文献380

同被引文献31

  • 1赵朋朋,高岭,崔志明.基于查询接口特征的Deep Web数据源自动分类[J].微电子学与计算机,2006,23(10):47-50. 被引量:11
  • 2Davison B D. Topical locality in the Web [C] //Proc of SIGIR. New York: ACM, 2000:272-279
  • 3Hofmann T. Probabilistic latent semantic analysis[C]//Proc of the 15th Conf on Uncertainty in Artificial Intelligence. New York: ACM, 1999:289-296
  • 4Hofmann T. Probabilistic latent semantic indexing [C] // Proc of SIGIR. New York: ACM, 1999:103-110
  • 5Barbosa L, Freire J. An adaptive crawler for locating hidden- Web entry points [C]//Proc of the 16th Int World Wide Web Conf. New York: ACM, 2007:441-450
  • 6Barbosa L, Freire J. Combining cl.assifiers to identify online databases [C] //Proc of the 16th Int World Wide Web Conf. New York: ACM, 2007:431-439
  • 7Barbosa L, Freire J. Siphoning hidden-Web data through keyword-based interfaces [C] //Proc of SBBD. Brazil: UnB, 2004:309-321
  • 8Bergholz A, Chidlovskii B. Crawling for domain-specific hidden Web resources [C]//Proc of WISE. Los Alamitos, CA: IEEE Computer Society, 2003:125-133
  • 9Han E, Karypis G. Centroid-based document classification: Analysis & experimental results [C]//Proc of European Conf on Principles of Data Mining and Knowledge Discovery (PKDD). Berlin: Springer, 2000:424-431
  • 10Lertnattee V, Theeramunkong T. Combining homogeneous classifiers for centroid based text classification [C] //Proc of the 7th Int Syrup on Computers and Communications. Los Alamitos, CA: IEEE Computer Society, 2002: 1034-1039

引证文献3

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部