摘要
本研究从《中国植物志》中随机采集1000个文档作为数据集,采用基于先导词的朴素贝叶斯算法实现中文物种描述文本的自动语义标注。通过实验性研究,实验数据表明,先导词能够有效提升朴素贝叶斯的标注效率。采用先导词后,F平均值提高0.048~0.107,尤以Fr为2时效果最好,整体标注性能F平均值高达0.902。各元素的标注性能也较为理想。Fr分别取1、2、3时,大部分元素的F值为0.730~0.964。
Based on leading words, this paper using NaYve Bayes to implement the automatic semantic annotation of species description text in Chinese with the data set of 1000 document which coliected from Flora of China randomly. Through experimental research, the data indicates that the leading words could effectively enhance the markup efficiency of NaYve Bayes. With the leading words, the average values of F increase 0. 048 to 0. 107. Particularly, when Fr takes the value of 2, the overall markup performance best with the average value of F achieved 0. 902. Markup performance is also ideal for each element. When Fr takes the value of 1 , 2, 3 respectively, the value of F of most elements is between 0. 730 and 0. 964.
出处
《情报学报》
CSSCI
北大核心
2012年第8期805-812,共8页
Journal of the China Society for Scientific and Technical Information
基金
本文为教育部人文社会科学青年项目“基于深度语义标注的网络中文学术信息抽取研究”(10YJC870004)的阶段性成果.
关键词
朴素贝叶斯
先导词
物种描述文本
语义标注
Naive Bayes, leading words, species description text, semantic annotation