期刊文献+

基于语义和版式的网上人物信息提取

Extraction of people information from web based on semantic and format
下载PDF
导出
摘要 本文利用本体思想,采用基于规则和统计相结合的算法,提出了一种网上人物信息提取算法,实现了半结构化人物信息的自动提取。通过程序统计的方法创建了一个包含4624个有效字段名的词典,用来检验提取出的字段名是否有效,当字段名有效时再提取其对应的字段值,大大提高了信息提取的准确率。实验结果表明,该算法对半结构化web人物网页信息提取具有较高的效率,平均准确率为97.6%,平均召回率为86.1%。 This paper presents an algorithm of extracting people information on web based on the combining of regulations and statistics,utilizing the idea of the ontology,to accomplish the auto-extracting information from the semi-structure people information.It established a field name dictionary which contained four thousands and six hundreds and twenty four effective field name by the method of program statistic,to check the effectiveness of the extracted field name.The precision of the IE was greatly raised because the field value was extracted only when the field name was effective.The final results display that the algorithm has high efficiency on web extraction of semi-structure people information,and the average precision and recall reach 97.6%and 86.1%,respectively.
出处 《微计算机信息》 2010年第12期145-147,共3页 Control & Automation
关键词 WEB信息抽取 抽取规则 半结构化网页 XML 版式分析 the Web IE IE regulations the semi-structure web page XML the web page format analyzing
  • 相关文献

参考文献8

  • 1孙斌(北京大学计算机系).信息提取技术概述[M].
  • 2周俊生,戴新字,尹存燕等.自然语言信息抽取中的机器学习方法研究[J].
  • 3LineEikvil(原著),陈鸿标(译),网上信息抽取技术纵览[M],2003.
  • 4Bing Liu, Kevin Chen-Chuan Chang, Editorial: Special Issue on Web[J], 1-2.
  • 5周明建,高济,李飞.基于本体论的Web信息抽取[J].计算机辅助设计与图形学学报,2004,16(4):535-541. 被引量:34
  • 6海量科技.中文智能分词.http://www.hylanda.com/producfffenci/[z].
  • 7Line Eikvil.Information Extraction from World Wide Web-A Survey[J], 1999.
  • 8李姗,黄水源.基于XML的WEB信息抽取模型设计[J].微计算机信息,2009(9):207-208. 被引量:5

二级参考文献11

  • 1朱敏,王开建,苏博.基于XML的企业网络数据集成模型研究[J].微计算机信息,2006(05X):37-39. 被引量:16
  • 2陈建辉,刘利民.基于模式发现的在线招聘信息抽取[J].微计算机信息,2006,22(09X):194-196. 被引量:5
  • 3M. E. Califf. Relational Learning Techniques for Natural Language Information Extraction. Ph.D. thesis, Department of Computer Sciences, University of Texas, Austin, August 1998. Technical Report AI98-276.
  • 4Chang CH, Lui SC, Wu YC. Applying pattern mining to Web information extraction [A]. In Proceedings of the Fifth Pacific Asia Conference on Knowledge Discovery and Data Mining [C]. Hong Kong, 2001
  • 5Hammer J, Garcia-Molina H, Nestorov S, et al. Template-based wrapper in the TSIMMIS system (system demonstration)[A]. In: Proceedings of ACM SIGMOD Conference on Management of Data, Tucson, Arizona, 1997. 532~535
  • 6Hammer J, Garcia-Molina H, Cho J, et al. Extracting semi-structured information from the Web[A]. In: Proceedings of Workshop on Management of Semi-Structured Data, Tucson, Arizona, 1997. 18~25
  • 7Kushmerick N, Weld D, et al. Induction for information extraction[A]. In: Proceedings of the 15th International Joint Conference on Artificial Intelligent, Nagoya, 1997, 2: 729~737
  • 8Ashish N, Knoblock C. Wrapper generation for semi-structured internet sources[A]. In: Proceedings of Workshop on Management of Semi-Structured Data, Tucson, Arizona, 1997. 10~17
  • 9陈羡.[D].杭州: 浙江大学,2001,18~20.
  • 10张绍华,徐林昊,杨文柱,薛文玲,李天柱.基于样本实例的Web信息抽取[J].河北大学学报(自然科学版),2001,21(4):431-437. 被引量:19

共引文献37

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部