摘要
针对现有语义标注系统通用性差的问题,本研究设计了基于先导词算法的MARTT语义标注系统。MARTT利用有监督的机器学习方法从文本中提取领域规则,以适应不同的数据集。为了检验算法的效率,研究以中国植物志和北美植物志数据为样本,运用十折交叉论证方法与NB、SVM的标注性能进行了比较。结果表明,先导词算法在准确率、召回率及计算成本上均优于其它两种算法。而且,在两个不同的数据集上都获得了理想的结果,证实MARTT所具有的良好适应性。
MARTT,a semantic annotation system based on leading words algorithm,has been designed for handling poor portability of existing systems.The system uses a supervised machine learning method to extract domain knowledge from the text so that it can adapt different description collections.In order to test the efficiency of the algorithm,the study compares leading words algorithm with NB and SVM by ten-fold cross demonstration method,using FNA and FOC as examples.Results show that leading words algorithm outperforms other two general learning algorithms in precision,recall and computational cost.More importantly,the algorithm works relatively equally well on both FNA and FOC descriptions,which verifies the good portability of MARTT.
出处
《图书情报知识》
CSSCI
北大核心
2011年第2期73-77,共5页
Documentation,Information & Knowledge