摘要
目前大多文档都不具有关键词,但手工标引关键词费时费力且主观性较强,因此关键词自动标引是一项值得研究的技术,由此引发的标引结果有效评价问题也成为一个亟需解决的问题。然而,评估关键词自动标引的性能并非一件容易的事情。针对常规自动标引评价方法存在的评价结果不能完全反映真实的标引结果以及评价成本高的情况,本文提出一种通用的自动标引评价模型。该模型可以有效地利用外部资源,在有参照情况下与无参照情况下,分别对标引结果进行评价。实验结果表明,自动标引通用评价模型能增加标引评价的可靠性,并且降低标引评价的成本。
Currently, a large portion of documents still do not have keywords assigned. At the same time, manual assignment of high quality keywords is expensive, time-consuming, and error prone. Therefore, it is worth studying on automatic keywords indexing and it is very necessary to evaluate the indexing results effectively. However, it is not always easy to evaluate the performance of keywords indexing system. The traditional evaluation methods cannot reflect the real results due to the exact match between the indexing data and the test data. Meanwhile, the cost of traditional evaluation methods is expected to be reduced. The general evaluation model of automatic indexing can take full advantage of the external knowledge resource to evaluate the results of automatic indexing. Tile evaluation method is divided into the reference-based evaluation and without-reference-based evaluation. Experimental results show that the general evaluation model can enhance the reliability and reduce the cost of evaluation.
出处
《情报学报》
CSSCI
北大核心
2009年第1期40-47,共8页
Journal of the China Society for Scientific and Technical Information
基金
本研究受“十一五”国家科技支撑计划重点项目(2006BAH03B02)、南京理工大学青年科研扶持基金项目(JGQN0701)、南京理工大学科研启动基金项目(AB41123)、2006年江苏省研究生培养创新工程项目资助.
关键词
自动标引
评价模型
语义相似度
相似度计算
automatic indexing, evaluation model, semantic similarity, similarity computation