摘要
基于用户查询日志的命名实体挖掘,目标是从用户查询日志中挖掘具有指定类别的命名实体。已有研究工作提出一种基于种子实体的挖掘方法,利用实体类别与候选实体之间的模板分布相似性来对候选实体进行排序。然而该挖掘方法忽略了命名实体具有歧义性、查询模板具有多义性和未标注实体信息,因而不能够有效的对候选实体进行排序。该文采用半监督话题模型,利用查询模板之间的关系来学习实体类别的模板分布,进而改善候选实体的排序效果。实验结果表明了该文提出方法的有效性。
Named entity mining from query log aims to mine a list of named entities with the specific type from the query log. Previous work proposed a seed-based method which ranked the candidate entities based on the similarity between the template distribution of the specified class and that of the entities. However, it doesn't take into ac- count the ambiguity of named entity, the polysemy of the template and the unlabeled data. In this paper, we propose a semi-supervised topic model, which leverages the relationship between the templates (i. e. the co-occurrence be- tween templates) to learn the template distribution of the specified class so as to improve the entity ranking. Experi- mental results show the effectiveness of the proposed method.
出处
《中文信息学报》
CSCD
北大核心
2012年第5期26-32,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(60903139
60873243
60933005)
国家863计划重点资助项目(2010AA012502
2010AA012503)