期刊文献+

基于链接相似度Web挖掘算法的研究与改进 被引量:5

STUDY AND IMPROVEMENT ON LINKAGE SIMILARITY-BASED WEB MINING ALGORITHM
下载PDF
导出
摘要 在Web挖掘分类模式基础上,研究和分析了基于链接分析的Web结构挖掘算法HITS(Hyperlink induced topic Search)。针对HITS算法在获取拓展集处理过程中只考虑基于根集网页链接出、入网页,不考虑出、入网页相似度的不足之处,提出了一种改进的DS-HITS(Document Similarity hyperlink induced topic search)算法。该算法在拓展集处理过程中引进多种反映网页相似度的权值,从而使获取的网页在核心和权威值方面明显得到改进。最后,基于Webla开源项目初始数据,对比了DS-HITS算法和HITS算法的搜索结果。 On the basis of Web mining classification pattern,a Web structure mining algorithm HITS based on linked-analysis is studied and analyzed in this paper.An improved DS-HITS algorithm is proposed in light of the shortcomings of HITS Algorithm which only considers the linked into and out of web pages based on root sets but does not consider the similarities of linked into and out of web pages in the acquiring course of expanded sets processing.Many kinds of weights reflecting the pages'similarities are introduced in this improved algorithm in the course of expanded sets processing,so that the core values and authorities of the acquired pages are to be improved significantly.Finally,the searching results of DS-HITS and HITS algorithm are compared based on the initial data of Webla's open source project.
出处 《计算机应用与软件》 CSCD 2011年第1期272-273,301,共3页 Computer Applications and Software
关键词 WEB挖掘 HITS算法 DS-HITS算法 Web mining HITS(Hyperlink induced topic search) algorithm DS-HITS(Document similarity hyperlink induced topic search) algorithm
  • 相关文献

参考文献10

  • 1Joe M Kleinberg. Authoriative sources in a hyperlinked environment[J]. Journal of the ACM, 1999,46 ( 5 ) :604 - 632.
  • 2Etzioni O. The world wide Web : Quagmire or gold mine [J]. Communizations of ACM, 1996,39 ( 11 ) :65-68.
  • 3韩家炜,孟小峰,王静,李盛恩.Web挖掘研究[J].计算机研究与发展,2001,38(4):405-414. 被引量:356
  • 4Gordian S. Livoff, Michael J A. Berry. Mining the Weh:Transforming Customer Data into Customer Value[M].沈钧毅,宋擒豹,燕彩蓉,等译.北京:电子工业出版社,2004.
  • 5WebLa:Web Linkage Analysis 2005.http://Webla.sourceforge.net/.
  • 6李华虎.基于语义的Web数据挖掘在在线阅读网站应用的研究[D].东华大学,2008.
  • 7朱炜,王超,李俊,潘金贵.Web超链分析算法研究[J].计算机科学,2003,30(9):89-93. 被引量:20
  • 8Kleinberg J. Authoritative sources in a hyperlinked environnterll [C]// Proc. 9^th ACM-SIAM Syrr-posium on Discrete Alogorithms. 1998. Extended version in Journal of the ACM 46(1999). Also appears as IBM Research Report RJ,10026.1997.
  • 9战学刚,林鸿飞,姚天顺.Infolite中文检索系统[J].小型微型计算机系统,2000,21(9):989-992. 被引量:9
  • 10李凡,林爱武,陈国社.一种基于VSM文本分类系统的设计与实现[J].华中科技大学学报(自然科学版),2005,33(3):53-55. 被引量:19

二级参考文献34

  • 1Page L, Brin S, Motwani R, Winograd T. The PageRank Citation Ranking : Bringing Order to the WEB. Jan 1998 and July 2001 at http://www. db. stanford. edu/-backub/PageRanksub. ps.
  • 2Brin S,Page L. The anatomy of a large-scale hypertextual WEB search engine, In: Proc of the Seventh Intl World Wide WEB Conf. 1998.
  • 3Richardson M,Domingos P. The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank, volume 14. MIT Press, Cambridge, MA, 2002.
  • 4Haveliwala T H. Topic-Sensitive PageRank. In:Proc of the Eleventh Intl World Wide WEB Conf. 2002.
  • 5Kleinberg J. Authoritative sources in a hyperlinked environmerit. In.. Proc 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Extended version in Journal of the ACM 46(1999). Also appears as IBM Research Report RJ 10076, May 1997.
  • 6Chakrabarti S,et al. Hypersearching the WEB. Scientific American. June 1999.
  • 7Henzinger M R,Bharat K. Improved algorithms for topic distillation in a hyperlinked environment. In:Proc of the 21'st Intl ACMSIGIR Conf on Research and Development in IR, Aug. 1998.
  • 8Lempel R,Moran S. The Stochastic Approach for Link-Structure Analysis (SALSA) and the TKC Effect. In:Porc 9 th Intl WorldWide WEB Conf. 2000.
  • 9Chakrabarti S, et al. Mining the WEB's link structure. IEEE Computer, Aug. 1999.
  • 10Chakrabarti S,et al. Automatic resource compilation by analyzing hyperlink structure and associated text. In:Proc 7th Intl WWW Conf. 1998.

共引文献398

同被引文献38

  • 1王知津,闫永君.网络计量法与内容分析法比较研究[J].图书馆学研究,2006(6):2-5. 被引量:22
  • 2赵莹莹,韩元杰.基于HITS与MASEL算法的融合算法[J].桂林电子工业学院学报,2006,26(4):251-254. 被引量:2
  • 3刘栋,刘希玉,郝婷婷.基于PageRank和HITS的Web结构挖掘算法研究[J].山东科学,2006,19(4):11-14. 被引量:6
  • 4杨中华,汪勇.程序员技能需求:基于内容分析法的分析[J].现代情报,2007,27(8):166-168. 被引量:3
  • 5左骁骏,张开拓.垂直搜索引擎主题爬虫搜索算法的一种改进算法[C]//昆明:2010年亚太青年通信与技术学术会议,2010:509-512.
  • 6Eagle N,Pentland A,Lazer D.Inferring Friendship Network Structure by Using Mobile Phone Data[C]//Proc.Nat’l Academy of Sciences,2009,106(36):15274-15278.
  • 7Gómez-Barroso J L,et al.Prospects of Mobile Search,tech.report EUR 24148EN,Inst.for Prospective Technological Studies(IPTS)[R].European Commission,2010.
  • 8Jeh G,Widom J. SimRank: A Measure of Structural-contextSimilarity [ C ]// Prec. of SIGKDD,2002.
  • 9Jeh G, Widom J. Scaling personalized web search [ C ]//Prec. of WWW, 2002.
  • 10Small H G. Co-citation in the scientific literature: A new measure of the relationship between two documents [ J ]. Journal of the American Society for Information Science, 1973, 24(4) : 265 -269.

引证文献5

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部