期刊文献+

基于链接信息的网页分类算法 被引量:1

A Webpage Classification Algorithm Based on Link Information
下载PDF
导出
摘要 为了提高网页文本分类的准确性.克服传统的文本分类算法易受网页中虚假、错误信息的影响.提出一种基于链接信息的网页分类算法.通过对K近邻方法的改进.利用当前网页与其父网页的链接信息对网页实沲分类,用空间向量表示待分类网页的父链接信息。在训练集合中找到K篇与该网页链接信息向量最相似的网页,计算该网页所属的类别,通过实验与传统文本分类算法进行了对比,验证了该方法的有效性. To improve the performance of webpages classification system, and overcome a large number of false, erroneous information filled in the webpages affect the traditional classification algorithms, this paper presents a web page classification algorithm based on link information. Based on the K Nearest Neighbor method, the webpages are classified by the links among webpages. In this paper, the webpage currently classified is presented by the link information of vector space, and find K webpages with the highest similarity to it in the training set, then it is classified to the proper category. We compare the method to traditional classification algorithms through experiments, and the results show that it's more effective.
出处 《微电子学与计算机》 CSCD 北大核心 2012年第6期108-112,共5页 Microelectronics & Computer
基金 国家自然科学基金项目(60373003) 河南工业大学校琏金项目(2006BSO009)
关键词 网页分类 类别 K近邻方法 链接信息分类 webpage classification category K-nearest Neighbor link information classification
  • 相关文献

参考文献8

  • 1Xiaoguang Qi, Brian D Davison. Web page classifica- tion: Features and algorithms [J]. ACM Computing Surveys, 2009,41 (2)..12 : 1-12 : 31.
  • 2万中英,王明文,廖海波.基于投影寻踪的中文网页分类算法[J].中文信息学报,2005,19(4):60-67. 被引量:11
  • 3秦杰,王春云,谢蕙,朱海丰.基于本体的元搜索引擎4级结果处理算法[J].微电子学与计算机,2010,27(6):39-42. 被引量:2
  • 4Open Directory Project [EB/OL]. [2008-07-23]. http.. //www. dmoz. org/about, html.
  • 5Craven M, Slattery S, Nigam K, First-order learning for web mining[C] // The 10th European Conference on Machine Learning. Chemnitz, 1998.
  • 6Kleinberg J M. Authoritative sources in a hyperlinked environment[J]. Journal of ACM, 1999, 46(5) :604- 632.
  • 7Rungsawang A, Manaskasemsak B. Parallel adaptive technique for computing PageRank[C]//The 14th Eu romicro International Conference. Parallel, 2006: 15- 17.
  • 8朱海丰.个性化全文搜索引擎关键技术研究[D].郑州:河南工业大学,2009.

二级参考文献22

  • 1张强弓,喻国宝,廖湖声,隋树林.一种元搜索引擎的查询结果处理模型[J].华南理工大学学报(自然科学版),2004,32(z1):47-51. 被引量:10
  • 2张体首,蔡明.语义搜索引擎概念模型[J].微电子学与计算机,2007,24(3):171-173. 被引量:10
  • 3Diego Calvanese,Giuseppe De Giacomo,Domenico Lem-bo.Query reformulation over ontology-based peers[C]//Proceedings of the Twelfth Italian Symposium on Ad-vanced Database Systems.Italy,2004:418-425.
  • 4Wikipedia.Semantic web[EB/CIL].(2008-12-01)[2008-12-19].http://en.wikipedia.org/wiki/Semantic[CD#*2]web.
  • 5Trajkova J,Gauch S.Improving ontology-based user Profiles[C]//Proceedings of RIAO 2004.France,2004:380-389.
  • 6Aas K., Eikvil L. Text Categorization:A Survey[Z]. http://citeseer.nj.nec.com/aas99text.html, 1999.
  • 7A. Hyvarinen, E. Oja. Independent component analysis: algorithms and applications[J]. Neural Networks 13,2000: 411-430.
  • 8Angela Montanari, Laura Lizzani. a projection pursuit approach to variable selection[J]. Computational Statistic&Data Analysis 35,2001:463-473.
  • 9DudaR. Hart P.E Stock D.G.李宏东 姚天翔等译.Pattern Oassifieation,Second Edition[M].模式分类:2003年9月第1版[M].机械工业出版社,2004年2月..
  • 10Emmanuel A., Iafis O. J., Unsupervised Feature Extraction Using Projection Pursuit[Z]. http://www.censsis.neu.edu/Education/StudentResearch/2001/posters/arzuaga-cruz_e!., 2001.

共引文献11

同被引文献12

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部