高效的动态脚本网页关联性挖掘算法研究

An Algorithm Study of Information Mining on Web Pages with Dynamic Scripts

下载PDF

导出

摘要由于动态脚本网页更多地采用脚本方式与用户交互,缺乏足够的链接信息,传统的公共搜索引擎仅通过基于链接分析的算法很难实现对此类网页关联性的一个高效挖掘,因为Web上的网页链接无法到达其内部内容。对这些网页的信息挖掘仍处在起步阶段,提出了一个此类文档关联信息的搜索方案,将动态脚本网页每一次加载产生的页面作为一个状态,以状态为信息挖掘的基本单位,就此描述了基于状态关联度匹配的动态脚本网页的分析算法。此外,具体给出了算法并行实现的步骤流程并通过实验证明了算法的效率。 The current search engines can not rank well when searching for web pages with dynamic scripts,because there is not enough hyperlink information.Traditional public search engines are implemented by the hyperlink analysis algorithms.It is hard to realize the effi cient searching for these related pages.This paper proposes a new searching approach based on traditional search engines and presents its framework of this approach.When loading a new page,a state is considered as a basic unit of ranking.Then,the paper presents a ranking al gorithm based on state-interrelated similarity of these pages.Finally,the paper describes implementing process of the algorithm,and demon strates the efficiency of the algorithm by experimental results.

作者谭涛 TAN Tao（School of Computer,China West Normal University,Nanchong 637002,China）

机构地区西华师范大学计算机学院

出处《电脑知识与技术》 2012年第5期3002-3005,共4页 Computer Knowledge and Technology

关键词动态脚本状态关联度匹配状态转换排序关联信息 dynamic scripts state-interrelated similarity state transition ranking related information searching

分类号 TP271 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献7

1夏冰,高军,王腾蛟,杨冬青.一种高效的动态脚本网站有效页面获取方法[J].软件学报,2009(20):176-183.
2王晓宇,周傲英.万维网的链接结构分析及其应用综述[J].软件学报,2003,14(10):1768-1780. 被引量：61
3Duda C,Frey G,Kossmann D,Matter R,Zhou C, AJAX Crawl:Making AJAX Applications Searchable[C], 上海, IEEE Computer Society 第25届国际数据工程会议,2009:78-89.
4Duda C,Frey G,Kossmann D,etal. AJAX Search:crawling,indexing and searching web 2.0 applications[C], Endow,Proc.VLDB.2008: 1440-1443.
5Lin,Z,King I,Lyu MR.PageSim:A novel link-based measure of web pages similarity[C].香港,IEEE Computer Society 第15届万维网国际会议,2006:1019-1020.
6Sadi,M S,Rahman M M H,Horiguchi S.A new algorithm to measure relevance among web pages[C].Prague, WIT Press第7届国际数据挖掘和信息工程会议,2006:243-251.
7方启明,杨广文,武永卫,郑纬民.基于P2P的Wleb搜索技术[J].软件学报,2008,19(10):2706-2719. 被引量：13

二级参考文献115

1Ding J, Gravano L, Shivakumar N. Computing geographical scopes of Web resources. In: Amr A, et al., eds. Proceedings of the 26th International Conference on Very Large Data Bases. Cairo: Morgan Kaufmann Publishers, 2000. 545-556.
2Bar-Yossef Z. Approximating aggregate queries about Web pages via random walks. In: Amr A, et al., eds. Proceedings of the 26th International Conference on Very Large Data Bases. Cairo: Morgan Kanfmann Publishers, 2000. 535-544.
3Larson R. Bibliometrics of the World Wide Web: An exploratory analysis of the intellectual stTucture of cyberspace. In: Hans-Peter F, et al., eds. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Zurich: ACM Press, 1996. 85-92.
4Botafago A. Cluster analysis for hypertext systems. In: Robert K, et al., eds. Proceedings of the 16th Annual ACM SIGIR Conference on Research and Development in Information Retrieval. Pittsburgh: ACM Press, 1993. 116-125.
5Mukherjea S. WTMS: A system for collecting and analyzing topic-specified web information. In: Albert V, et al., eds. Proceedings of the 9th ACM-WWW International Conference. Amsterdam: ACM Press, 2000. 457--471.
6Kumar R, Raghavan P, Rajagopalan S, Sivakumar D, Tomkins A, Upfal E. The Web as a graph. In: Serge A, ed. Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Pennsylvania: ACM Press, 1999.109-118.
7Carriere J, Kazman R. WebQuery: Searching and visualizing the Web through connectivity. Computer Networks and ISDN Systems, 1997,29(8-13): 1257-1267.
8Chakrabarti S, Dora B, Indyk P. Enhanced hypertext classification using hyperlinks. In: Laura H, ed. Proceedings of the ACM SIGMOD International Conference on Management of Data. Washington: ACM Press, 1998. 307-318.
9Spertus E. ParaSite: Mining strctural information on the Web. Computer Networks and ISDN Systems, 1997,29(8-13):1205-1215.
10Cooley R, Mobasher B, Srivastava J. Web mining: Information and pattern discovery on the World Wide Web. In: Enrico P, ed.Proceedings of the 9th International Conference on Tools with Artificial Intelligence. Newport Beach: IEEE Computer Society,1997. 558-567.

共引文献74

1蒋宗礼,李宪雷,徐学可.基于主题Hub值的元搜索[J].北京工业大学学报,2009,35(3):397-402. 被引量：1
2许丹海,杜震洪,曾志,刘仁义,张丰.基于P2P移动终端的多源空间数据访问策略[J].浙江大学学报（理学版）,2012,39(4):466-470.
3苏铓,史国振,李凤华,申莹,黄琼,王苗苗.细粒度超媒体描述模型及其使用机制[J].通信学报,2013,34(S1):223-229. 被引量：1
4冉丽,何毅舟,许龙飞.基于Web结构挖掘的搜索引擎作弊检测方法[J].计算机应用,2004,24(10):158-160. 被引量：4
5熊海灵,伍胜,余建桥,李航.一种基于RPUC的Web文档索引库的更新算法[J].计算机科学,2004,31(8):95-96. 被引量：1
6邱均平,张洋.网络信息计量学综述[J].高校图书馆工作,2005,25(1):1-12. 被引量：44
7杨光.链接分析在企业竞争情报活动中的应用[J].图书情报工作,2005,49(1):19-21. 被引量：17
8王凤霞,张景,常晓.电子教务门户研究与设计[J].计算机工程,2005,31(8):227-229. 被引量：2
9王凤霞,吕林涛.基于.Net的分布式软件体系结构设计与应用[J].微电子学与计算机,2005,22(3):144-147. 被引量：7
10刘红,邵晓良,胡吉兵.基于页面内容和链接结构的超链接主题预测算法[J].现代图书情报技术,2005(5):41-45. 被引量：1

1范凯.java技术[J].程序员,2008(3):19-19.
2王联国,刘成忠.基于Web的数据库技术及实现[J].甘肃农业大学学报,2002,37(4):493-497. 被引量：3
3高中华,牟晓东.一个“黑吃黑”豪夺WebShell的实验[J].电脑知识与技术（经验技巧）,2012(7):48-50.
4张敏霞,陈波,颜志英,马荣.不确定对象的ASP动态脚本管理方法[J].浙江工业大学学报,2004,32(5):501-503.
5庄立君.通过VML基于ASP技术生成WEB统计图表的研究[J].甘肃科技纵横,2015,44(7):32-35.
6阿发.用CSS实现中英文双语导航菜单[J].网友世界,2006(22):56-56.
7王志巧,张峰.导弹遥测数据实时处理软件设计与实现[J].宇航计测技术,2015,35(1):54-56. 被引量：3
8文德民,门爱东,文爱平.基于Cookie的跨域单点登录系统的设计[J].电脑知识与技术,2009,5(11X):9146-9148. 被引量：8
9张学成,叶晓俊.基于虚拟机技术的高性能手机游戏平台[J].微计算机信息,2008,24(6):4-6. 被引量：3
10夏立民,王华.基于VML的矢量图形动态生成过程的研究[J].计算机技术与发展,2006,16(11):218-221. 被引量：14

电脑知识与技术

2012年第5期

浏览历史

内容加载中请稍等...

高效的动态脚本网页关联性挖掘算法研究

参考文献7

二级参考文献115

共引文献74

相关作者

相关机构

相关主题

浏览历史