期刊文献+

基于Web的电子期刊元数据信息抽取方法 被引量:7

Web-based extraction of periodical metadata information
下载PDF
导出
摘要 通过对各种Web信息抽取方式的分析,将一种新的抽取方法应用于电子期刊信息抽取.该方法首先应用文档结构相对路径结合节点内容特征进行相似度比较来完成对所需抽取信息块的精确定位;然后对于需要抽取出来的各个信息项则采用正则表达式构造文本信息项的特征模式;在此基础上,实现准确抽取.测试结果表明:基于Web的电子期刊元数据信息抽取方法在查全率和精确度方面高于一般的信息抽取方法,取得了比较令人满意的效果. A novel method which was adopted to extract periodical metadata was proposed after various ways to extract the information from webs was analyzed.Before the metadata were extracted,those target information blocks were correctly extracted by using relative paths in document and the contents of nodes to jude similarity.According to the similarity,the target information blocks were located.Regular expressions were used to feature the text of the extracted information The experiment results showed the method ob...
出处 《华中科技大学学报(自然科学版)》 EI CAS CSCD 北大核心 2007年第12期13-15,共3页 Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金 中国下一代互联网资助项目(CNGI-04-15-7A) 湖北省科技基础条件平台专项基金资助项目 武汉市科技攻关资助项目(20061002032)
关键词 信息抽取 包装器 模式匹配 电子期刊 information extraction wrap pattern matching periodical metadata
  • 相关文献

参考文献8

  • 1[1]Garcia-Molina H,Hammer J,Ireland K,et al.Integrating and Accessing Heterogeneous Information Sources in TSIMMIS[C]∥Proceedings of the AAAI Symposium on Information Gathering.San Francisco:Stanford,1995:61-64.
  • 2[2]ARANAUD S,FABIEN A.Building light-weight wrappers for legacy Web data-sources using W4F[C]∥Proceedings of 25th VLDB Conference.Scotland:Edinburgh,1999:738-741.
  • 3[3]Laender A H F,Ribeiro-Neto B A,da Silva A S,et al.A Brief Survey of Web Data Extraction Tools[J].ACM SIGMOD Record,2002,31(2):84-93.
  • 4[4]卢睿.信息的抽取[D].大连:大连海事学院信息工程学院,2004.
  • 5[5]Liger F,McQueen C,Wilton P.C#字符串和正则表达式参考手册[M].刘乐亭,译.北京:清华大学出版社,2003.
  • 6金莉,卢正鼎.Web信息提取中多策略学习算法的研究[J].华中科技大学学报(自然科学版),2003,31(1):22-24. 被引量:3
  • 7郭志鑫.基于本体的文档引文元数据信息抽取[J].微计算机信息,2006,22(06X):304-306. 被引量:18
  • 8李跃进,赵晶,林鸿飞.基于Internet的军事演习信息抽取系统[J].计算机工程与应用,2006,42(14):214-218. 被引量:6

二级参考文献23

  • 1娄雅斌,陶凤梅,马垣.基于“本体”的异构数据源的集成方法研究[J].微计算机信息,2005,21(10X):117-118. 被引量:20
  • 2[1]Quinlan J R, Chameron-Jones R M. Foll: a midterm report. in: Brazdil P ed. Proceedings of the 6th European Conference on Machine Learning Volume 667 of Lecture Noters in Artificial Intelligence. Austrila: Springer-Verlag, 1993. 3~30.
  • 3[2]Ciravegna F. (LP)2, An adaptive algorithm for information extraction from Web-related texts. in: Nebel B ed. Proceedings of the 17th International Joint Conference on Artificial Intelligence. San Fransisco: Morgan-Kaufmnn, 2001. 1251~1256
  • 4[3]Grishman R, Sundheim B. Design of the MUC-6 evaluation. in: San Mateo ed. Proceedings of 6th Message Understanding conferece. San Fransisco: Morgan-Kaufmarrn, 1995. 1~11
  • 5Ralph Grishman.Information Extraction:Techniques and Challenges.Lecture Notes in Computer Science,1997;1299:10~27.
  • 6Eikvil L.Information Extraction from World Wide Web-A survey[R].Technical Report 945,Norweigan Computing Center,1999.
  • 7Jerry R Hobbs,Douglas Appelt,John Bear et al.FASTUS:A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text[C].In:Finite State Devices for Natural Language Processing,MIT press,1996.
  • 8E Agchtein,Gravano.Snowball:Extracting relations from Large Plaintext Collections[C].In:Proceedings of the 5th ACM International Conference on Digital Libraries,2000.
  • 9Valter Crescenzi,Giansalvatore Mecca,Paolo Merialdo.Automatic Web Information Extraction in the ROADRUNNER System.Lecture Notes in Computer Science,2002 ;2465:264~277.
  • 10I Musiea.Extraction patterns for information extraction tasks:A survey[C].In:Proceedings of the AAAI 1999 Workshop on Machine Learning for Information Extraction,1999.

共引文献23

同被引文献41

引证文献7

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部