摘要
问答系统是信息检索系统的一种高级形式,为了提高网络蜘蛛在抽取问答系统信息时的爬行效率,从问答系统所特有的布局结构特点出发,结合正则表达式,设计了一个针对问答系统的网络蜘蛛爬行策略。实验证明,该爬行策略提高了网络蜘蛛爬行效率,节省了网络带宽和本地存储空间,有效地提高了答案抽取的精度和效率。
Q and A system has gradually become a new information retrieval technology by returning directly the precise answers to users. In order to improve the web spider's crawl efficiency in the extraction of information from Q and A system,considering the unique characteristics of Q and A system's layout structure and combined with regular expression, a web spider crawling strategy for Q and A system is designed. The experiment results show that this crawling strategy can greatly improve web spider crawl efficiency and save network bandwidth and local storage space to improve the accuracy and efficiency of the answer extraction.
出处
《宿州学院学报》
2012年第5期32-35,共4页
Journal of Suzhou University
基金
宿州学院智能信息处理实验室开放课题"用户提问与问答系统中问答对之间的语义相似度研究"(2012YKF36)
安徽省高校自然科学研究一般项目"P2P环境下基于本体的资源语义共享和检索研究"(KJ2011B173)
关键词
正则表达式
网络蜘蛛
问答系统)DOM树
regular expression
Web Spider
Question Answering System
Document Object Model Tree