期刊文献+

基于贝叶斯分类的中文物种描述文本的语义标注研究 被引量:3

Semantic Annotation of Species Description Text in Chinese Literature by Naive Bayes Classifier
下载PDF
导出
摘要 本研究从《中国植物志》中随机采集1000个文档作为数据集,采用基于先导词的朴素贝叶斯算法实现中文物种描述文本的自动语义标注。通过实验性研究,实验数据表明,先导词能够有效提升朴素贝叶斯的标注效率。采用先导词后,F平均值提高0.048~0.107,尤以Fr为2时效果最好,整体标注性能F平均值高达0.902。各元素的标注性能也较为理想。Fr分别取1、2、3时,大部分元素的F值为0.730~0.964。 Based on leading words, this paper using NaYve Bayes to implement the automatic semantic annotation of species description text in Chinese with the data set of 1000 document which coliected from Flora of China randomly. Through experimental research, the data indicates that the leading words could effectively enhance the markup efficiency of NaYve Bayes. With the leading words, the average values of F increase 0. 048 to 0. 107. Particularly, when Fr takes the value of 2, the overall markup performance best with the average value of F achieved 0. 902. Markup performance is also ideal for each element. When Fr takes the value of 1 , 2, 3 respectively, the value of F of most elements is between 0. 730 and 0. 964.
出处 《情报学报》 CSSCI 北大核心 2012年第8期805-812,共8页 Journal of the China Society for Scientific and Technical Information
基金 本文为教育部人文社会科学青年项目“基于深度语义标注的网络中文学术信息抽取研究”(10YJC870004)的阶段性成果.
关键词 朴素贝叶斯 先导词 物种描述文本 语义标注 Naive Bayes, leading words, species description text, semantic annotation
  • 相关文献

参考文献15

  • 1Biodiversity Heritage Library [ EB/OL]. [ 2011-08-08 ]. http ://www. biodiversitylibrary, org/.
  • 2Innergirl.省级植物志数字化全部完成[EB/OL].2011-05-18.http://www.bhl-china.org/cms/node/47/.[2011-08-08].
  • 3王照岳,孙建伶,董金祥.XML数据库管理系统研究[J].计算机科学,2002,29(1):115-118. 被引量:16
  • 4Taylor A. Extracting Knowledge from Biological Descrip- tions[ C l// Proceedings of 2rid International Conference on Building and Sharing Very Large-Scale Knowledge Bases, 1995 : 114-119.
  • 5Vanel J-M. Worldwide Botanical Knowledge Base [ OL ]. 2004. http://wwbota, free. fr/. [2011-10-11 ].
  • 6Sautter G, Bohm K, Agosti D. A combining approach to find all taxon names [ M ]. Biodiversity Informatics, 2006 (3) :46-58.
  • 7Tang X, Heidorn P B. Using Automatically Extracted Information in Species Page Retrieval [ OL ]. [ 2011-08- 10 ]. http ://www. tdwg. org/proceedings/article/view/195/.
  • 8Wood M M, Lydon S J, Tablan V, et al. Populating a Database from Parallel Texts Using Ontology-Based Information Extraction [ C ]//In 9th International Conference on Applications of Natural Language to Information Systems ( NLDB), volume 3136 of LNCS. Springer, 2004.
  • 9Cui H, Heidorn P. The reusability of induced knowledge for automatic semantic markup of taxonomic descriptions [ J ]. Journal of the American Society for Information Science and Technology,2007,58 ( 1 ) : 133-149.
  • 10罗贝,吴洁,曹存根,邵志清.从文本中获取植物知识方法的研究[J].计算机科学,2005,32(10):6-13. 被引量:13

二级参考文献33

  • 1孙即祥.现代模式识别[M].长沙:国防科技大学出版社,2003..
  • 2Abiteboul S,et al.The Lorel query language for semistructured data.International Journal on Digital Libraries,1997,1 (1): 68~ 88
  • 3Deutseh A,et al.A Query Language for XML.Computer Networks,1999,31(l1-16):1lS5~ll69
  • 4Ceri S,et al.XML-GL: a Graphical Language for Querying and Restructuring WWW Data.In: Proc.of 8th Intl.WWW conf.May1999
  • 5Chamberlin D,Robie J,Florescu D.Quilt: An XML Query Language for Heterogeneous Data Source.WebDB2000,May 2000
  • 6Bonifati A,Ceri S.Comparative Analysis of Five XML Query Languages.SIGMOD Record,2000,29 (1): 68 ~ 79
  • 7Goldman R,Widom J.DataGuides: Enabling Query Formulation and Optimization in Semistructrued Databases.In: Proc.Of the 23rd VLDB Conf.Athens Greece,1997
  • 8McHugh J,et al.Indexing Semistructured Data: [ Technical report].Stanford University Database Group,1998
  • 9McHugh J,Widom J.Query Optimization for XML.In:Proc.Of the 25th VLDB Conf.1999
  • 10姚建中.XML文档查询处理与数据库存储的研究:[浙江大学硕士学位论文].2000

共引文献32

同被引文献28

  • 1罗贝,吴洁,曹存根,邵志清.从文本中获取植物知识方法的研究[J].计算机科学,2005,32(10):6-13. 被引量:13
  • 2郑家恒,菅小艳.农作物信息抽取系统的设计与实现[J].计算机工程,2006,32(7):197-198. 被引量:5
  • 3向阳,王敏,马强.基于Jena的本体构建方法研究[J].计算机工程,2007,33(14):59-61. 被引量:34
  • 4中国植物志编辑委员会.中国植物志[M].北京:科学出版社,1959.
  • 5MitchellTM著 曾华军 张银奎译.机器学习[M].北京:机械工业出版社,2003..
  • 6Cui H. The XML Schema for MARTT[OL].[2012-08-08]. http://publish.uwo.ca/-hcui7/research/xmlschema.xsd.
  • 7Michie D,Spiegelhalter D J,Taylor C C.Machine Learning, Neural and Statistical Classification[M]. New York: Ellis Horwood, 1994.
  • 8Sacchi L, Tucker A, Counsell S, et al. Improving Predictive Models of Glaucoma Severity by Incorporationg Quality Indicators[J]. Artificial Intelligence in Medicine, 2014, 60(2): 103-112.
  • 9Cui H. MARTT:A General Approach to Automatic Markup of Taxonomic Descriptions with XML[OL]. [2011-10-12]. http://cais-acsi.ca/proceedings/2005/cui_2005.pdf.
  • 10BHL. Biodiversity Heritage Library [EB/OL]. [2015-09-27]http ://www.biodiversitylibrary.org/.

引证文献3

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部