摘要
互联网上存在许多有价值的信息,搜索引擎只能索引静态页面,无法索引Deep Web数据,而Deep Web通常以表单形式存在,只有提交表单查询才能获得其数据,如何发现和识别Deep Web查询接口成为人们关注的问题。在分析表单表现形式与功能内在的联系的基础上,提出一个表单的抽象模型,依此过滤非Deep Web查询接口的表单。通过对返回结果页面分析方法,实现Deep Web查询接口的识别,实验结果证明了该方法的有效性。
There are many valuable resources on the Intemet, traditional search engines work well for finding the pages which are static and linked to other pages, and ignore the deep web which only produce results dynamically in response to a direct request of the form, so many people pay attention to find and recognize the query interface of deep web. On the deep web,many sources are structured by provid- ing structured query interfaces and results, extract the features of query forms based on the features available on the search interlaces ,and introduce a generic operational model which can support advanced query and analysis, then delete the interfaces which is not the interface of deep web on the model. Through analysing the structure of response pages, recognize the query interface of deep web. The results of experiments validate the feasibility of the approach.
出处
《计算机技术与发展》
2009年第7期117-119,123,共4页
Computer Technology and Development
基金
安徽省自然科学基金项目(070412051)