摘要
由于动态脚本网页更多地采用脚本方式与用户交互,缺乏足够的链接信息,传统的公共搜索引擎仅通过基于链接分析的算法很难实现对此类网页关联性的一个高效挖掘,因为Web上的网页链接无法到达其内部内容。对这些网页的信息挖掘仍处在起步阶段,提出了一个此类文档关联信息的搜索方案,将动态脚本网页每一次加载产生的页面作为一个状态,以状态为信息挖掘的基本单位,就此描述了基于状态关联度匹配的动态脚本网页的分析算法。此外,具体给出了算法并行实现的步骤流程并通过实验证明了算法的效率。
The current search engines can not rank well when searching for web pages with dynamic scripts,because there is not enough hyperlink information.Traditional public search engines are implemented by the hyperlink analysis algorithms.It is hard to realize the effi cient searching for these related pages.This paper proposes a new searching approach based on traditional search engines and presents its framework of this approach.When loading a new page,a state is considered as a basic unit of ranking.Then,the paper presents a ranking al gorithm based on state-interrelated similarity of these pages.Finally,the paper describes implementing process of the algorithm,and demon strates the efficiency of the algorithm by experimental results.
作者
谭涛
TAN Tao(School of Computer,China West Normal University,Nanchong 637002,China)
出处
《电脑知识与技术》
2012年第5期3002-3005,共4页
Computer Knowledge and Technology
关键词
动态脚本
状态关联度匹配
状态转换
排序
关联信息
dynamic scripts
state-interrelated similarity
state transition
ranking
related information searching