期刊文献+

基于用户查询日志的命名实体挖掘 被引量:8

Mining Named Entities from Query Logs
下载PDF
导出
摘要 针对大规模查询日志中丰富的命名实体的挖掘是数据挖掘领域中的重要研究课题。已有的研究工作提出了一种基于种子实体的抽取框架,利用实体间的分布相似度进行挖掘。然而该工作只有当种子实体仅属于单个语义类别时才能取得好的结果,实际上命名实体往往可能从属于多个类别。该文通过引入一个弱指导话题模型,利用少量的人工指导信息,很好地解决了实体的类别模糊性,提高了挖掘的有效性。实验表明该文提出的方法在实体挖掘性能上显著优于已有的方法。 Mining named entities from query logs is an important research field in data mining. Previous work proposed a seed--based framework to mine named entities from query logs by leveraging distribution similarity, which works well only when each named entity only belongs to a signle semantic class. In fact, named entities may often belong to multiple classes. In this paper, we introduce a weakly-supervised topic model to resolve class ambiguity of named entities by leveraging weak supervision from human. The experiment results show that our approach significantly outperforms the previous method.
出处 《中文信息学报》 CSCD 北大核心 2010年第1期71-76,116,共7页 Journal of Chinese Information Processing
关键词 计算机应用 中文信息处理 分开命名实体 用户查询日志 话题模型 computer application Chinese information processing named entity query log topic model
  • 相关文献

参考文献7

  • 1Borthwick Andrew, Sterling J. , Agichtein E, Grishman R.. NYU: Description of the MENE Named Entity System as used in MUC-7 [C]//Proc. Seventh Message Understanding Conference. 1998.
  • 2Cucehiarelli Alessandro, Velardi P. Unsupervised Named Entity Recognition Using Syntactic and Semantic Contextual Evidence [J]. Computational Linguistics,2001,27(1): 123-131.
  • 3Evans Richard. A Framework for Named Entity Recognition in the Open Domain[C]// Proc. Recent Ad vances in Natural Language Processing. 2003.
  • 4Pasca, M. Weakly-supervised discovery of named entities using web seareh queries[C]// Proceedings of the Sixteenth ACM Conference on Conference on information and Knowledge Management, 2007.
  • 5D. M. Blei and J. D. Lafferty. Correlated topic models[C]// Proceedings of the 23rd International Conference on Machine Learning, 2006:113-120.
  • 6T. Hofmann. Probabilistic latent semantic indexing [C]// SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, 1999: 50-57.
  • 7D. M. Blei, A. Y. Ng and M. I. Jordan. Latent dirichlet allocation[J]. Journal of Machine Learning Research,2003, 3(1): 993-1022.

同被引文献106

  • 1邹纲,刘洋,刘群,孟遥,于浩,西野文人,亢世勇.面向Internet的中文新词语检测[J].中文信息学报,2004,18(6):1-9. 被引量:59
  • 2余慧佳,刘奕群,张敏,茹立云,马少平.基于大规模日志分析的搜索引擎用户行为分析[J].中文信息学报,2007,21(1):109-114. 被引量:117
  • 3LISA F RAU. Extracting company names from text [ C ]//Proceedings of the 7th Conference on Artificial Intelligence Applications. Washington: IEEE Computer Society, 1991:29-32.
  • 4HAI LEONG CHIEU, HWEE TOU NG. Named entity recognition: a maximum entropy approach using global information[C]//Proceedings of the 19th International Conference on Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2032: 1-7.
  • 5KOICHI TAKEUCHI, NIGEL COLLIER. Use of support vector machines in extended named entity recognition [C]//Proceedings of the 6th Conference on Natural Language Learning. Stroudsburg, PA: Association for Computational Linguistics, 2002 : 1-7.
  • 6HOIFUNG POON, PEDRO DOMINGOS. Joint inference in information extraction [ C ]//Proceedings of the 22nd National Conference on Artificial Intelligence. [ S. l. ] : AAAI Press, 2007:913-918.
  • 7COLLINS MICHAEL, SINGER YORAM. Unsupervised models for named entity classification [ C ]//Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. [S. l. ] :[s.n. ], 1999:100-110.
  • 8WHITELAW CASEY, KEHLENBECK ALEX, PETROVIC NEMANJA, et al. Web-scale named entity recognition[ C ]//Proceeding of the 17th ACM Conference on Information and KNOWLEDGE Management. New York: ACM Press, 2008 : 123-132.
  • 9ETZIONI OREN, CAFARELLA MICHAEL, DOWNEY DOUG, et al. Unsupervised named-entity extraction from the web: an experimental study [ J ]. Artificial Intelligence, 2005, 165(1 ) :91-134.
  • 10ENRIQUE ALFONSECA, SURESH MANANDHAR. An unsupervised method for general named entity recognition and automated concept discovery [ C ]//Proceedings of the 1st International Conference on General WordNet. [S.l. ] :[s.n. ], 2002:1-9.

引证文献8

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部