摘要
针对Web信息抽取领域中存在的"项无序"问题,提出了一种基于二维关联边条件随机场模型的Web信息抽取方法。将Web文档解析为一个词性序列,映射待抽取的信息项的状态,映射待抽取的信息项为二维关联边条件随机场中的序列参数,使用归纳算法构造二维关联边条件随机场模型。实验结果证明该方法可以获得更好的抽取性能。
To solve disorder among information items in the field of Web information extraction, this paper proposes a Web information extraction algorithm based on 2D correlative-chain conditional random fields. It parses a Web document into a part of speech sequence, and maps an information item to a state with mapping information items to be extracted for the two-dimensional Correlative-Chain Conditional Random Fields (2D-CRFs). A 2D- CRFs model is obtained by using induction algorithm. Experiments show that the algorithm has better extraction performance.
出处
《价值工程》
2010年第34期186-186,共1页
Value Engineering
关键词
条件随机场
WEB信息抽取
归纳算法
conditional random fields
web information extraction
induction algorithm