摘要
模式抽取在半结构化数据研究领域中具有重要意义。论文结合同类对象集和标签路径的概念,提出了一种从OEM模型中抽取模式的新方法。算法的基本思想是:在用OEM模型表示的半结构化数据中查找同类对象集,并通过构造模式表的方法来实现模式抽取。这种方法不但能从层次结构数据中抽取模式,而且还能从包含环路的OEM数据中进行模式抽取,克服了其它一些算法不能从带有环路的数据中进行模式抽取的缺点。
Extracting schema is important in the field of semistructured data research.This paper presents a new approach to this topic with the conception of homo-object set and label path.The new approach finishes extracting schema by tow steps:firstly,searching all homo-object sets from OEM model;secondly,constructing schema table.This approach not only extracts schema from level structured data,but also from OEM data which include circle,while some other approaches can not extract schema from OEM data which include circle.
出处
《计算机工程与应用》
CSCD
北大核心
2006年第27期162-165,共4页
Computer Engineering and Applications