摘要
本文介绍一种基于扩展标记图ETG(Extended Tag Graph)的网页信息抽取与重组新技术,引入了扩展标记图操作和重构概念,提出了作为用户接口的标记查询语言TagSQL。用户通过类标准SQL的语言描述,即可方便地实现对网页信息的灵活抽取和重组操作。
Based on Extended Tag Graph (ETG)[1], a new technique for information extraction and reconstruction of Web pages has been presented in the paper. We have introduced the concepts of ETG Operations and ETG Reconstruction, and put forward a Tag Structured Query Language (TagSQL)in the design of user interface. By using the language given in the similar form as SQL, a user can describe conveniently the operations for the information extraction and reconstruction of Web pages.
出处
《计算机科学》
CSCD
北大核心
2004年第5期56-60,64,共6页
Computer Science
基金
重庆市科技公关项目(2001.6715)
重庆大学骨干教师资助计划项目(2003A33)