摘要
中文分词的结果是影响搜索引擎中文检索结果质量的重要因素,能否准确有效的分词对提高搜索结果的相关性和用户满意度都至关重要。本文回顾和整理了中文分词评估所依靠的理论依据,同时建立了一套完整的基于搜索引擎中文分词评估方法。这套评估方法涵盖了评估样本的提取、评估人员选取、评估标准的制定、以及评估流程的设置等各个方面。实例分析的结果表明此方法是行之有效的。在此基础上,作者进一步对实验评估的结果进行了深入讨论,并提出了提高评估效果的几条建议,包括如何考虑评估人员背景、取舍评估项目等。
Chinese word segmentation is one of the determinants of result quality of Chinese search engines. Whether Chinese words are segmented effectively and correctly is vital to improving the relevance of the searching results and enhancing user satisfaction. The author first reviews the fundamental theories upon which Chinese segmentation evaluation methods are build, and then develops an integrated methodology measuring the quality of Chinese segmentation for web search engine. A set of methods and guidelines are proposed, addressing sampling issues, selection of evaluators, definition and selection of metrics, procedureof the evaluation, and etc. Then the methodology was applied in a real search engine evaluation in practice, and proved to be effective. The result of the evaluation was analyzed, and suggestions concerning evaluator screening and item rejection are provided, with the aim to get a better evaluation performance.
出处
《情报科学》
CSSCI
北大核心
2007年第1期108-112,共5页
Information Science
关键词
中文分词
搜索引擎
信息检索
评估方法
chinese word segmentation
web search engine
information retrieval
evaluation methodology