基于用户查询日志的命名实体挖掘被引量：8

Mining Named Entities from Query Logs

下载PDF

导出

摘要针对大规模查询日志中丰富的命名实体的挖掘是数据挖掘领域中的重要研究课题。已有的研究工作提出了一种基于种子实体的抽取框架,利用实体间的分布相似度进行挖掘。然而该工作只有当种子实体仅属于单个语义类别时才能取得好的结果,实际上命名实体往往可能从属于多个类别。该文通过引入一个弱指导话题模型,利用少量的人工指导信息,很好地解决了实体的类别模糊性,提高了挖掘的有效性。实验表明该文提出的方法在实体挖掘性能上显著优于已有的方法。 Mining named entities from query logs is an important research field in data mining. Previous work proposed a seed--based framework to mine named entities from query logs by leveraging distribution similarity, which works well only when each named entity only belongs to a signle semantic class. In fact, named entities may often belong to multiple classes. In this paper, we introduce a weakly-supervised topic model to resolve class ambiguity of named entities by leveraging weak supervision from human. The experiment results show that our approach significantly outperforms the previous method.

作者翟海军郭嘉丰王小磊许洪波

机构地区中国科学技术大学计算机学院中国科学院计算技术研究所

出处《中文信息学报》 CSCD 北大核心 2010年第1期71-76,116,共7页 Journal of Chinese Information Processing

关键词计算机应用中文信息处理分开命名实体用户查询日志话题模型 computer application Chinese information processing named entity query log topic model

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献7

1Borthwick Andrew, Sterling J. , Agichtein E, Grishman R.. NYU: Description of the MENE Named Entity System as used in MUC-7 [C]//Proc. Seventh Message Understanding Conference. 1998.
2Cucehiarelli Alessandro, Velardi P. Unsupervised Named Entity Recognition Using Syntactic and Semantic Contextual Evidence [J]. Computational Linguistics,2001,27(1): 123-131.
3Evans Richard. A Framework for Named Entity Recognition in the Open Domain[C]// Proc. Recent Ad vances in Natural Language Processing. 2003.
4Pasca, M. Weakly-supervised discovery of named entities using web seareh queries[C]// Proceedings of the Sixteenth ACM Conference on Conference on information and Knowledge Management, 2007.
5D. M. Blei and J. D. Lafferty. Correlated topic models[C]// Proceedings of the 23rd International Conference on Machine Learning, 2006:113-120.
6T. Hofmann. Probabilistic latent semantic indexing [C]// SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, 1999: 50-57.
7D. M. Blei, A. Y. Ng and M. I. Jordan. Latent dirichlet allocation[J]. Journal of Machine Learning Research,2003, 3(1): 993-1022.

同被引文献106

1邹纲,刘洋,刘群,孟遥,于浩,西野文人,亢世勇.面向Internet的中文新词语检测[J].中文信息学报,2004,18(6):1-9. 被引量：59
2余慧佳,刘奕群,张敏,茹立云,马少平.基于大规模日志分析的搜索引擎用户行为分析[J].中文信息学报,2007,21(1):109-114. 被引量：117
3LISA F RAU. Extracting company names from text [ C ]//Proceedings of the 7th Conference on Artificial Intelligence Applications. Washington: IEEE Computer Society, 1991:29-32.
4HAI LEONG CHIEU, HWEE TOU NG. Named entity recognition: a maximum entropy approach using global information[C]//Proceedings of the 19th International Conference on Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2032: 1-7.
5KOICHI TAKEUCHI, NIGEL COLLIER. Use of support vector machines in extended named entity recognition [C]//Proceedings of the 6th Conference on Natural Language Learning. Stroudsburg, PA: Association for Computational Linguistics, 2002 : 1-7.
6HOIFUNG POON, PEDRO DOMINGOS. Joint inference in information extraction [ C ]//Proceedings of the 22nd National Conference on Artificial Intelligence. [ S. l. ] : AAAI Press, 2007:913-918.
7COLLINS MICHAEL, SINGER YORAM. Unsupervised models for named entity classification [ C ]//Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. [S. l. ] :[s.n. ], 1999:100-110.
8WHITELAW CASEY, KEHLENBECK ALEX, PETROVIC NEMANJA, et al. Web-scale named entity recognition[ C ]//Proceeding of the 17th ACM Conference on Information and KNOWLEDGE Management. New York: ACM Press, 2008 : 123-132.
9ETZIONI OREN, CAFARELLA MICHAEL, DOWNEY DOUG, et al. Unsupervised named-entity extraction from the web: an experimental study [ J ]. Artificial Intelligence, 2005, 165(1 ) :91-134.
10ENRIQUE ALFONSECA, SURESH MANANDHAR. An unsupervised method for general named entity recognition and automated concept discovery [ C ]//Proceedings of the 1st International Conference on General WordNet. [S.l. ] :[s.n. ], 2002:1-9.

引证文献8

1曹雷,郭嘉丰,程学旗.基于二部图半监督方法的查询日志实体挖掘[J].山东大学学报（理学版）,2012,47(5):32-37. 被引量：2
2曹雷,郭嘉丰,白露,程学旗.基于半监督话题模型的用户查询日志命名实体挖掘[J].中文信息学报,2012,26(5):26-32. 被引量：6
3张梅,段建勇,徐骥超.人名属性知识挖掘及其在查询分类中的应用[J].现代图书情报技术,2013(9):82-87. 被引量：1
4李雪伟,吕学强,刘克会.扩展搜索日志上下文的新词识别[J].现代图书情报技术,2014(11):59-65.
5任育伟,吕学强,李卓,徐丽萍.搜索日志中命名实体识别[J].现代图书情报技术,2015(6):49-56.
6任育伟,吕学强,李卓,徐丽萍.基于查询热度和实体识别的查询推荐[J].计算机应用研究,2016,33(3):657-660. 被引量：1
7刘彤,倪维健,柳梅.面向搜索引擎查询日志的领域术语自动识别方法[J].现代图书情报技术,2016(2):25-33. 被引量：2
8何峰,岳江浩.基于改进贝叶斯的时效性实体词挖掘[J].信息与电脑（理论版）,2014,0(6):73-74.

二级引证文献11

1张泽伟,矫健,张仰森.基于PMI-IR的联想词表构造方法研究[J].计算机技术与发展,2014,24(6):140-144. 被引量：1
2赖娟,金澎,洪艳伟.文本分类中的主动多域学习[J].西南师范大学学报（自然科学版）,2014,39(7):108-114. 被引量：3
3李雪伟,吕学强,刘克会.扩展搜索日志上下文的新词识别[J].现代图书情报技术,2014(11):59-65.
4曾镇,吕学强,李卓.搜索日志中中文人名的自动识别[J].现代图书情报技术,2014(12):71-77. 被引量：1
5任育伟,吕学强,李卓,徐丽萍.搜索日志中命名实体识别[J].现代图书情报技术,2015(6):49-56.
6任育伟,吕学强,李卓,徐丽萍.基于查询热度和实体识别的查询推荐[J].计算机应用研究,2016,33(3):657-660. 被引量：1
7翟劼,裘江南.基于规则的知识元属性抽取方法研究[J].情报科学,2016,34(4):43-47. 被引量：13
8李树青,曹杰,庄光光,陈俊鹏.基于二分网络分析方法的学术文献关键词自动抽取方法研究[J].情报学报,2016,35(12):1305-1312. 被引量：7
9张博,张斌,孙达明,张书波.一种融合用户学习过程的用户查询意图模型[J].计算机应用研究,2017,34(6):1640-1646. 被引量：2
10刘作国,陈笑蓉.面向文本聚类的实体—动作关联模型研究[J].中文信息学报,2018,32(5):22-30. 被引量：3

1曹雷,郭嘉丰,白露,程学旗.基于半监督话题模型的用户查询日志命名实体挖掘[J].中文信息学报,2012,26(5):26-32. 被引量：6
2曹雷,郭嘉丰,程学旗.基于二部图半监督方法的查询日志实体挖掘[J].山东大学学报（理学版）,2012,47(5):32-37. 被引量：2
3郭立力,赵春江.高效FTP搜索引擎的设计与实现[J].华南理工大学学报（自然科学版）,2009,37(1):135-139. 被引量：7
4伍大勇,刘挺.基于随机游走模型的查询日志中命名实体挖掘[J].智能计算机与应用,2012,2(4):22-26. 被引量：3
5胡亮,傅泽田,张小栓,赵明,郭立力,宫薇薇.K-FTP搜索引擎的核心技术[J].计算机工程,2008,34(13):19-20.
6史杰,施恒利,杨辉.基于用户日志的相关搜索模型研究[J].信息技术,2015,39(2):134-137.
7熊忠阳,向海燕,张玉芳.结合用户日志的局部上下文分析方法[J].计算机工程与应用,2012,48(12):74-77. 被引量：3
8关晓炟,吕学强,李卓,郑略省.用户查询日志中的中文机构名识别[J].现代图书情报技术,2014(1):72-78. 被引量：4
9翟海军,郭勇,郭嘉丰,程学旗.基于转移学习的命名实体挖掘技术[J].上海交通大学学报,2011,45(2):164-167. 被引量：3
10唐静笑,吕学强,柳成洋,李涵.搜索日志中领域查询串识别研究[J].计算机工程与设计,2014,35(5):1766-1771.

中文信息学报

2010年第1期

浏览历史

内容加载中请稍等...

基于用户查询日志的命名实体挖掘被引量：8

参考文献7

同被引文献106

引证文献8

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

基于用户查询日志的命名实体挖掘 被引量：8

参考文献7

同被引文献106

引证文献8

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

基于用户查询日志的命名实体挖掘被引量：8