期刊文献+

基于半监督话题模型的用户查询日志命名实体挖掘 被引量:6

Named Entity Mining from Query Log through Semi-supervised Topic Modeling
下载PDF
导出
摘要 基于用户查询日志的命名实体挖掘,目标是从用户查询日志中挖掘具有指定类别的命名实体。已有研究工作提出一种基于种子实体的挖掘方法,利用实体类别与候选实体之间的模板分布相似性来对候选实体进行排序。然而该挖掘方法忽略了命名实体具有歧义性、查询模板具有多义性和未标注实体信息,因而不能够有效的对候选实体进行排序。该文采用半监督话题模型,利用查询模板之间的关系来学习实体类别的模板分布,进而改善候选实体的排序效果。实验结果表明了该文提出方法的有效性。 Named entity mining from query log aims to mine a list of named entities with the specific type from the query log. Previous work proposed a seed-based method which ranked the candidate entities based on the similarity between the template distribution of the specified class and that of the entities. However, it doesn't take into ac- count the ambiguity of named entity, the polysemy of the template and the unlabeled data. In this paper, we propose a semi-supervised topic model, which leverages the relationship between the templates (i. e. the co-occurrence be- tween templates) to learn the template distribution of the specified class so as to improve the entity ranking. Experi- mental results show the effectiveness of the proposed method.
出处 《中文信息学报》 CSCD 北大核心 2012年第5期26-32,共7页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60903139 60873243 60933005) 国家863计划重点资助项目(2010AA012502 2010AA012503)
关键词 用户查询日志 命名实体挖掘 半监督话题模型 query log named entity mining~ Semi-supervised Topic Model
  • 相关文献

参考文献11

  • 1Marius Pasca. Weakly-supervised discovery of named entities using Web search queries[C]// Proceedings of the 16th ACM Conference on Information and Knowl- edge Management, 2007: 683-690.
  • 2翟海军,郭嘉丰,王小磊,许洪波.基于用户查询日志的命名实体挖掘[J].中文信息学报,2010,24(1):71-76. 被引量:8
  • 3Jiafeng Guo, Gu Xu, Xueqi Cheng, et al. Named enti- ty recognition in query[C]// Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009: 267-274.
  • 4Gu Xu, Shuang-Hong Yang, Hang Li. Named entitymining from click-through data using weakly super- vised latent dirichlet allocation [C]// Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009: 1365- 1374.
  • 5Junwu Du, Zhimin Zhang, Jun Yan, et al. Using search session context for named entity recognition in query[C]// Proceeding of the 33rd international ACM SIGIR Conference on Research and Development in In- formation Retrieval, 2010: 765-766.
  • 6Thomas Hofmann. Probabilistic latent semantic inde- xing[C]// Proceeding of the 22nd International ACM SIGIR Conference on Research and Development in In- formation Retrieval. 1999, 50-57.
  • 7David M. Blei, Andrew Y. Ng, Michael I. Jordan. Latent dirichlet allocation [J].Journal of Machine Learning Research, 2003, 3: 993-1022.
  • 8David M. Blei, Jon D. MeAuliffe. Supervised topic models[C]// Proceedings of the 21st Annual Confer- ence on Neural Information Processing Systems, 2007.
  • 9Yue Lu, Chengxiang Zhai. Opinion integration through semi-supervised topic modeling[C]//Proceed- ing of the 17th International Conference on World Wide Web, 2008: 121-130.
  • 10ChengXiang Zhai, Atulya Velivelli, Bei Yu. A cross- collection mixture model for comparative text mining [C]// Proceedings of the 10th ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining, 2004: 743-748.

二级参考文献7

  • 1Borthwick Andrew, Sterling J. , Agichtein E, Grishman R.. NYU: Description of the MENE Named Entity System as used in MUC-7 [C]//Proc. Seventh Message Understanding Conference. 1998.
  • 2Cucehiarelli Alessandro, Velardi P. Unsupervised Named Entity Recognition Using Syntactic and Semantic Contextual Evidence [J]. Computational Linguistics,2001,27(1): 123-131.
  • 3Evans Richard. A Framework for Named Entity Recognition in the Open Domain[C]// Proc. Recent Ad vances in Natural Language Processing. 2003.
  • 4Pasca, M. Weakly-supervised discovery of named entities using web seareh queries[C]// Proceedings of the Sixteenth ACM Conference on Conference on information and Knowledge Management, 2007.
  • 5D. M. Blei and J. D. Lafferty. Correlated topic models[C]// Proceedings of the 23rd International Conference on Machine Learning, 2006:113-120.
  • 6T. Hofmann. Probabilistic latent semantic indexing [C]// SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, 1999: 50-57.
  • 7D. M. Blei, A. Y. Ng and M. I. Jordan. Latent dirichlet allocation[J]. Journal of Machine Learning Research,2003, 3(1): 993-1022.

共引文献7

同被引文献72

  • 1黄德根,马玉霞,杨元生.基于互信息的中文姓名识别方法[J].大连理工大学学报,2004,44(5):744-748. 被引量:12
  • 2邹纲,刘洋,刘群,孟遥,于浩,西野文人,亢世勇.面向Internet的中文新词语检测[J].中文信息学报,2004,18(6):1-9. 被引量:59
  • 3向晓雯,史晓东,曾华琳.一个统计与规则相结合的中文命名实体识别系统[J].计算机应用,2005,25(10):2404-2406. 被引量:37
  • 4余慧佳,刘奕群,张敏,茹立云,马少平.基于大规模日志分析的搜索引擎用户行为分析[J].中文信息学报,2007,21(1):109-114. 被引量:117
  • 5黄玉兰,龚才春,许洪波,等.基于局部性原理的有意义串提取方法[c]∥第四届全国信息检索与内容安全学术会议论文集(上),2008;56-64.
  • 6Salton G. The smart retrieval system - experiments in automat- ic document processing [ M ]. Upper Saddle River, NJ : Prentice Hall, 1971.
  • 7Gauch S, Wang Jianying. Corpus analysis for TREC 5 query expansion[ C ]//Proceedings of the text retrieval conference. [s. 1. ] :[s.n. ],1996.
  • 8Schutze H,Pedersen J O. A co-occurrence-based thesaurus and two applications to information retrieval [ J ]. Information Processing and Management, 1997,33 ( 3 ) : 307 -318.
  • 9Crouch C J. A cluster-based approach to thesaurus construc- tion[ C ]//Proceedings of the 1 l th annual international ACMSIGIR conference on research and development in information retrieval. New York, NY, USA : ACM, 1988 : 309-320.
  • 10Crouch C J. An approach to the automatic construction of glob- al thesauri [ J ]. Information Processing and Management, 1990,26(5 ) :629-640.

引证文献6

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部