摘要
为提升索引编制的准确率与效率,改善基于关键词的提取算法无法很好地提取与书籍主题相关并且具有索引价值的索引项的问题,提出综合评价方式进行书后索引项的提取。利用候选索引项在知识库中的类别和引用关系,借鉴网页排名(PageRank)算法计算候选索引项的领域重要度;对书籍内部信息进行全面分析,利用统计、位置等特征计算候选索引项的书籍内部重要度;构建综合评价模型评价候选索引项作为书后索引项的适合程度。实验结果表明,所提方法在准确率、召回率和F值方面比未改进的算法有显著提高。
To improve the accuracy and efficiency of indexing,and improve the keyword-based extraction algorithm which cannot extract the items related to the subject and valuable to the back-of-the-book index,a comprehensive evaluation method was proposed to extract the index terms.The candidate index terms were extracted according to the category structure and reference relationship in the knowledge base,and their domain confidence was calculated using the PageRank algorithm.A comprehensive analysis of the internal information of the books was conducted and the internal importance of the candidate index terms was calculated using the characteristics of statistical and location,etc.A comprehensive evaluation model was established to evaluate the suitability of candidates as the back-of-the-book index terms.Experimental results show that the proposed method is better than the original algorithm in accuracy,recall and F-measure.
作者
田梦
李宁
吕淑琪
田英爱
许洁
TIAN Meng;LI Ning;LYU Shu-qi;TIAN Ying-ai;XUN Jie(Computer School,Beijing Information Science and Technology University,Beijing 100101,China;China Electronics Standardization Institute,Beijing 100007,China)
出处
《计算机工程与设计》
北大核心
2019年第1期261-267,共7页
Computer Engineering and Design
基金
国家自然科学基金项目(61672105)
国家863高技术研究发展计划基金项目(2015AA015403)
"核高基"国家科技重大专项基金项目(2012ZX01045-006)
关键词
书后索引
候选索引项提取
书后索引项提取
网页排名算法
特征评价
back-of-the-book index
candidate index term extraction
back-of-the-book index term extraction
PageRank
feature evaluation