期刊文献+

高效的动态脚本网页关联性挖掘算法研究

An Algorithm Study of Information Mining on Web Pages with Dynamic Scripts
下载PDF
导出
摘要 由于动态脚本网页更多地采用脚本方式与用户交互,缺乏足够的链接信息,传统的公共搜索引擎仅通过基于链接分析的算法很难实现对此类网页关联性的一个高效挖掘,因为Web上的网页链接无法到达其内部内容。对这些网页的信息挖掘仍处在起步阶段,提出了一个此类文档关联信息的搜索方案,将动态脚本网页每一次加载产生的页面作为一个状态,以状态为信息挖掘的基本单位,就此描述了基于状态关联度匹配的动态脚本网页的分析算法。此外,具体给出了算法并行实现的步骤流程并通过实验证明了算法的效率。 The current search engines can not rank well when searching for web pages with dynamic scripts,because there is not enough hyperlink information.Traditional public search engines are implemented by the hyperlink analysis algorithms.It is hard to realize the effi cient searching for these related pages.This paper proposes a new searching approach based on traditional search engines and presents its framework of this approach.When loading a new page,a state is considered as a basic unit of ranking.Then,the paper presents a ranking al gorithm based on state-interrelated similarity of these pages.Finally,the paper describes implementing process of the algorithm,and demon strates the efficiency of the algorithm by experimental results.
作者 谭涛 TAN Tao(School of Computer,China West Normal University,Nanchong 637002,China)
出处 《电脑知识与技术》 2012年第5期3002-3005,共4页 Computer Knowledge and Technology
关键词 动态脚本 状态关联度匹配 状态转换 排序 关联信息 dynamic scripts state-interrelated similarity state transition ranking related information searching
  • 相关文献

参考文献7

  • 1夏冰,高军,王腾蛟,杨冬青.一种高效的动态脚本网站有效页面获取方法[J].软件学报,2009(20):176-183.
  • 2王晓宇,周傲英.万维网的链接结构分析及其应用综述[J].软件学报,2003,14(10):1768-1780. 被引量:61
  • 3Duda C,Frey G,Kossmann D,Matter R,Zhou C, AJAX Crawl:Making AJAX Applications Searchable[C], 上海, IEEE Computer Society 第25届国际数据工程会议,2009:78-89.
  • 4Duda C,Frey G,Kossmann D,etal. AJAX Search:crawling,indexing and searching web 2.0 applications[C], Endow,Proc.VLDB.2008: 1440-1443.
  • 5Lin,Z,King I,Lyu MR.PageSim:A novel link-based measure of web pages similarity[C].香港,IEEE Computer Society 第15届万维网国际会议,2006:1019-1020.
  • 6Sadi,M S,Rahman M M H,Horiguchi S.A new algorithm to measure relevance among web pages[C].Prague, WIT Press第7届国际数据挖掘和信息工程会议,2006:243-251.
  • 7方启明,杨广文,武永卫,郑纬民.基于P2P的Wleb搜索技术[J].软件学报,2008,19(10):2706-2719. 被引量:13

二级参考文献115

  • 1Ding J, Gravano L, Shivakumar N. Computing geographical scopes of Web resources. In: Amr A, et al., eds. Proceedings of the 26th International Conference on Very Large Data Bases. Cairo: Morgan Kaufmann Publishers, 2000. 545-556.
  • 2Bar-Yossef Z. Approximating aggregate queries about Web pages via random walks. In: Amr A, et al., eds. Proceedings of the 26th International Conference on Very Large Data Bases. Cairo: Morgan Kanfmann Publishers, 2000. 535-544.
  • 3Larson R. Bibliometrics of the World Wide Web: An exploratory analysis of the intellectual stTucture of cyberspace. In: Hans-Peter F, et al., eds. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Zurich: ACM Press, 1996. 85-92.
  • 4Botafago A. Cluster analysis for hypertext systems. In: Robert K, et al., eds. Proceedings of the 16th Annual ACM SIGIR Conference on Research and Development in Information Retrieval. Pittsburgh: ACM Press, 1993. 116-125.
  • 5Mukherjea S. WTMS: A system for collecting and analyzing topic-specified web information. In: Albert V, et al., eds. Proceedings of the 9th ACM-WWW International Conference. Amsterdam: ACM Press, 2000. 457--471.
  • 6Kumar R, Raghavan P, Rajagopalan S, Sivakumar D, Tomkins A, Upfal E. The Web as a graph. In: Serge A, ed. Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Pennsylvania: ACM Press, 1999.109-118.
  • 7Carriere J, Kazman R. WebQuery: Searching and visualizing the Web through connectivity. Computer Networks and ISDN Systems, 1997,29(8-13): 1257-1267.
  • 8Chakrabarti S, Dora B, Indyk P. Enhanced hypertext classification using hyperlinks. In: Laura H, ed. Proceedings of the ACM SIGMOD International Conference on Management of Data. Washington: ACM Press, 1998. 307-318.
  • 9Spertus E. ParaSite: Mining strctural information on the Web. Computer Networks and ISDN Systems, 1997,29(8-13):1205-1215.
  • 10Cooley R, Mobasher B, Srivastava J. Web mining: Information and pattern discovery on the World Wide Web. In: Enrico P, ed.Proceedings of the 9th International Conference on Tools with Artificial Intelligence. Newport Beach: IEEE Computer Society,1997. 558-567.

共引文献74

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部