期刊文献+

基于机器学习的生物多样性英文文档语义标注研究 被引量:2

The Semantic Annotation of English Biodiversity Documents Based on Machine Learning
下载PDF
导出
摘要 针对现有语义标注系统通用性差的问题,本研究设计了基于先导词算法的MARTT语义标注系统。MARTT利用有监督的机器学习方法从文本中提取领域规则,以适应不同的数据集。为了检验算法的效率,研究以中国植物志和北美植物志数据为样本,运用十折交叉论证方法与NB、SVM的标注性能进行了比较。结果表明,先导词算法在准确率、召回率及计算成本上均优于其它两种算法。而且,在两个不同的数据集上都获得了理想的结果,证实MARTT所具有的良好适应性。 MARTT,a semantic annotation system based on leading words algorithm,has been designed for handling poor portability of existing systems.The system uses a supervised machine learning method to extract domain knowledge from the text so that it can adapt different description collections.In order to test the efficiency of the algorithm,the study compares leading words algorithm with NB and SVM by ten-fold cross demonstration method,using FNA and FOC as examples.Results show that leading words algorithm outperforms other two general learning algorithms in precision,recall and computational cost.More importantly,the algorithm works relatively equally well on both FNA and FOC descriptions,which verifies the good portability of MARTT.
出处 《图书情报知识》 CSSCI 北大核心 2011年第2期73-77,共5页 Documentation,Information & Knowledge
关键词 语义标注 MARTT 机器学习 生物多样性 Semantic annotation MARTT Machine learning Biodiversity
  • 相关文献

参考文献13

  • 1GBIF. Global Biodiversity Information Facility. [2007-07-10]. http: //www. gbif. org /.
  • 2BHL. Biodiversity Heritage Library. [ 2007-07-10 ]. http: // www. bhl. si. edu /.
  • 3Taylor, A. Extracting Knowledge from Biological Descriptions. Proceedings of 2nd International Conference on Building and Sharing Very Large-Scale Knowledge Bases, 1995.. 114-119.
  • 4Abascal, R. & S &nchez. X-tract: Structure Extraction from Botanical Textual Descriptions. Proceeding of the String Processing & Information Retrieval Symposium and International Workshop on Groupware, SPIRE/CRIWG, 1999:2-7.
  • 5Vanel, J.-M. Worldwide Botanical Knowledge Base. http: // wwbota, free. fr /.
  • 6Sautter,G, Agosti, D, B hm, K. Semi-Automated XML Markup of Biosystematics Legacy Literature with the GoldenGATE Editor, Proceedings of PSB 2007, Wailea, HI, USA. http: //psb. stanford, edu/psb-online/proceedings/psb07/sautter, pdf.
  • 7Cui, H. Converting Taxonomic Descriptions to New Digital Formats. Biodiversity Informatics, 2008(5):20-40.
  • 8Cui, H, & HeidOrn, P B. The reusability of induced knowledge for the automatic semantic markup of taxonomic descriptions. Journal of the American Society for Information Science and Technology, 2007,58( 1 ) :133-149.
  • 9Cui, H. et al. Automated Concept Discovery in Corpora of Morphological Descriptions. Proceedings of the Annual Meeting of the American Society for Information Science and Technology (ASIST), 2006,43( 1 ) :22-24.
  • 10Cui, H. MARTT: A General Approach to Automatic Markup of Taxonomic Descriptions with XML. Proceedings of Canadian Association for Information Science 2005. http: //www. cais-acsi. ca/proceedings/2005/cui_2005, pdf.

同被引文献12

引证文献2

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部