基于机器学习的生物多样性英文文档语义标注研究被引量：2

The Semantic Annotation of English Biodiversity Documents Based on Machine Learning

下载PDF

导出

摘要针对现有语义标注系统通用性差的问题,本研究设计了基于先导词算法的MARTT语义标注系统。MARTT利用有监督的机器学习方法从文本中提取领域规则,以适应不同的数据集。为了检验算法的效率,研究以中国植物志和北美植物志数据为样本,运用十折交叉论证方法与NB、SVM的标注性能进行了比较。结果表明,先导词算法在准确率、召回率及计算成本上均优于其它两种算法。而且,在两个不同的数据集上都获得了理想的结果,证实MARTT所具有的良好适应性。 MARTT,a semantic annotation system based on leading words algorithm,has been designed for handling poor portability of existing systems.The system uses a supervised machine learning method to extract domain knowledge from the text so that it can adapt different description collections.In order to test the efficiency of the algorithm,the study compares leading words algorithm with NB and SVM by ten-fold cross demonstration method,using FNA and FOC as examples.Results show that leading words algorithm outperforms other two general learning algorithms in precision,recall and computational cost.More importantly,the algorithm works relatively equally well on both FNA and FOC descriptions,which verifies the good portability of MARTT.

作者崔红段宇锋郦芳

机构地区美国亚利桑那大学图书馆学与信息资源学院华东师范大学商学院信息学系

出处《图书情报知识》 CSSCI 北大核心 2011年第2期73-77,共5页 Documentation,Information & Knowledge

关键词语义标注 MARTT 机器学习生物多样性 Semantic annotation MARTT Machine learning Biodiversity

分类号 G354 [文化科学—情报学]

引文网络
相关文献

参考文献13

1GBIF. Global Biodiversity Information Facility. [2007-07-10]. http: //www. gbif. org /.
2BHL. Biodiversity Heritage Library. [ 2007-07-10 ]. http: // www. bhl. si. edu /.
3Taylor, A. Extracting Knowledge from Biological Descriptions. Proceedings of 2nd International Conference on Building and Sharing Very Large-Scale Knowledge Bases, 1995.. 114-119.
4Abascal, R. & S &nchez. X-tract: Structure Extraction from Botanical Textual Descriptions. Proceeding of the String Processing & Information Retrieval Symposium and International Workshop on Groupware, SPIRE/CRIWG, 1999:2-7.
5Vanel, J.-M. Worldwide Botanical Knowledge Base. http: // wwbota, free. fr /.
6Sautter,G, Agosti, D, B hm, K. Semi-Automated XML Markup of Biosystematics Legacy Literature with the GoldenGATE Editor, Proceedings of PSB 2007, Wailea, HI, USA. http: //psb. stanford, edu/psb-online/proceedings/psb07/sautter, pdf.
7Cui, H. Converting Taxonomic Descriptions to New Digital Formats. Biodiversity Informatics, 2008(5):20-40.
8Cui, H, & HeidOrn, P B. The reusability of induced knowledge for the automatic semantic markup of taxonomic descriptions. Journal of the American Society for Information Science and Technology, 2007,58( 1 ) :133-149.
9Cui, H. et al. Automated Concept Discovery in Corpora of Morphological Descriptions. Proceedings of the Annual Meeting of the American Society for Information Science and Technology (ASIST), 2006,43( 1 ) :22-24.
10Cui, H. MARTT: A General Approach to Automatic Markup of Taxonomic Descriptions with XML. Proceedings of Canadian Association for Information Science 2005. http: //www. cais-acsi. ca/proceedings/2005/cui_2005, pdf.

同被引文献12

1Tayor A. A Extracting Knowledge from Biological Descriptions[A].1995.114-119.
2Bertini M,Cucchiara R,Prati A. An Integrated Framework for Semantic Annotation and Adaptation[J].Multimedia Tools and Application,2005,(03):345-363.
3Heath T,Christian B. Semantic Annotation and Retrieval:Web of Data[M].VerlagBerlin Heidelberg:Springer,2011.201-204.
4Sanchez D,Isern D,Millan M. Content Annotation for the Se-mantic web:an Automatic Web - based Approach[J].Knowl-edge and Information Systems,2011,(03):393-418.
5Hofmann T. Unsupervised Learning by Probabilistic Latent Se-mantic Analysis[J].{H}Machine Learning,2011,(01):177-196.
6郑庆华;刘均;田锋.Web 知识挖掘:理论、方法与应用[M]{H}北京:科学出版社,2010114-116.
7丁艳辉,李庆忠,董永权,彭朝晖.基于集成学习和二维关联边条件随机场的Web数据语义标注方法[J].计算机学报,2010,33(2):267-278. 被引量：6
8张玉峰,蔡皎洁.基于数据挖掘的Web文本语义分析与标注研究[J].情报理论与实践,2010,33(2):85-88. 被引量：7
9李志欣,施智平,李志清,史忠植.融合语义主题的图像自动标注[J].软件学报,2011,22(4):801-812. 被引量：50
10马晓悦.考虑观点多样性的社会化语义网知识组织模式探究[J].情报科学,2016,34(7):25-30. 被引量：4

引证文献2

1王云英.基于PLSA模型的Web页面语义标注算法研究[J].情报杂志,2013,32(1):141-144. 被引量：5
2丁洁兰,刘细文.科学合作的多样性研究综述[J].图书情报工作,2022,66(11):129-138. 被引量：2

二级引证文献7

1黄卫东,陈凌云,吴美蓉.网络舆情话题情感演化研究[J].情报杂志,2014,33(1):102-107. 被引量：35
2郭少友,窦畅,常桢.网页语义标注研究综述[J].情报杂志,2015,34(4):169-175. 被引量：6
3任艳.微信息大数据粗糙集的近似约简[J].沈阳工业大学学报,2016,38(3):309-313. 被引量：4
4王东,孙彬,张绍武.微信息进程与流量检测指令分布下的倾向性检测模型[J].云南大学学报（自然科学版）,2016,38(5):714-723. 被引量：1
5张光勇,陈志伟.网络数据库访问中语义指向性算法优化[J].现代电子技术,2016,39(16):112-115. 被引量：1
6翟羽佳,赵雅洁,过南杉,张赫钊.基于信任机制的科研合作知识传递效应研究[J].情报理论与实践,2023,46(12):113-121. 被引量：1
7柳美君,步一,杨斯杰.科研团队成员国别差异性的测度、演变及其与团队产出影响力的关系[J].情报学报,2024,43(7):818-838.

1马绪超.文献主题检索中导词的运用[J].情报探索,1997(4):26-27.
2马绪超.文献主题检索中导词的运用[J].图书情报工作,1998,42(1):37-38. 被引量：1
3Brief Instructions to Authors[J].Neuroscience Bulletin,2014,30(6).
4Brief Instructions to Authors[J].Neuroscience Bulletin,2014,30(2).
5Brief Instructions to Authors[J].Neuroscience Bulletin,2015,31(2).
6Brief Instructions to Authors[J].Neuroscience Bulletin,2014,30(5).
7Brief Instructions to Authors[J].Neuroscience Bulletin,2014,30(4).
8郭晨茜,孟令洋,高彬,朱淼.全媒体《中国植物志》编纂构想与设计[J].农业图书情报学刊,2017,29(2):152-156. 被引量：1
9姚璧君.在植物研究所图书馆工作二十年[J].图书情报工作,1984,28(1):24-27.
10马新林.浅谈图书的成本、定价与利润[J].出版经济,2001(12):36-38. 被引量：1

图书情报知识

2011年第2期

浏览历史

内容加载中请稍等...

基于机器学习的生物多样性英文文档语义标注研究被引量：2

参考文献13

同被引文献12

引证文献2

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于机器学习的生物多样性英文文档语义标注研究 被引量：2

参考文献13

同被引文献12

引证文献2

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于机器学习的生物多样性英文文档语义标注研究被引量：2