期刊文献+

基于众包的词汇联想网络的获取和分析 被引量:6

Constructing Word Association Network by Crowdsourcing
下载PDF
导出
摘要 词典是汉语自然语言处理中非常重要的一类资源,它能为汉语词法句法以及语义分析等提供资源支撑。该文采用众包方法构建汉语语义相关性词典,该词典是通过触发词联想的方式间接获取的,因此又称为词汇联想网络。词汇联想网络相比传统词典具有以下特点:(1)获取代价低;(2)面向互联网,易扩展;(3)词语关系从人的认知角度来建立,符合人的直觉。该文详细介绍词汇联想网络的获取方法并对已获取的数据进行分析,另外,将词汇联想网络与《知网》、《同义词词林》以及微博文本ngram进行比较说明其上述特点。 Dictionaries are crucial to the natural language processing. It's a fundamental resource for Chinese word segmentation, POS tagging, parsing and so on. This paper presents a method to build semantic relevance dictionary with crowdsourcing, which is triggered by the word association indirectly. Compared with traditional dictionaries, the so called word association network has following advantages: 1)Low cost; 2)Internet oriented and easy to ex- pend; 3)Word relationship is determined from the perspective of human cognition and is consistent with human intui tion. In addition to describing the way of building word association network, we also analyzed the data obtained, comparing it with Hownet, TongYiCi CiLin and word ngrams from Weiho to show its characteristics.
出处 《中文信息学报》 CSCD 北大核心 2013年第3期100-106,共7页 Journal of Chinese Information Processing
基金 国家自然科学基金重点资助项目(61133012) 国家863重大资助项目(2011AA01A207) 国家863先进技术研究资助项目(2012AA011102)
关键词 众包 语义相关性词典 词汇联想网络 crowdsourcing semantic relevance dictionary word association network
  • 相关文献

参考文献13

  • 1张梅山,邓知龙,车万翔,等.统计与词典相结合的领域自适应中文分词[C]//第十一届全国计算语言学学术会议,2011.8.
  • 2Amit Chandel, P C Nagesh, S Sarawagi. Efficient batch top-k search for dictionary-basedentity recogni- tion[C]//Proceedings of the 22nd International Con- ference on Data Engineering, 2006:28.
  • 3Simonetta Montemagni, Lucy Vanderwende. Structur-al patterns vs. string patterns for extracting semantic information from dictionaries[C]//Proceedings of the 14th conference on Computational linguistics, August, 1992 : 23-28.
  • 4董振东,董强.知网.http://www.keenage.com[M].2000.
  • 5梅家驹,竺一鸣,高蕴琦,等.同义词词林(第二版)[M].上海辞书出版社.1996.
  • 6Luis von Ahn, Labeling Images with a Computer Game[C]//ACM Conf, on Human Factors in Compu- ting Systems, CHI 2004: ala-ag6.
  • 7Ann lrvine, Alexandre Nlementlev. Using Mechanical Turk to Annotate Lexicons for Less Commonly Used Languages [C]//Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, pages 108 113, Los Angeles, California, June 2010.
  • 8Mukund Jha, Jacob Andreas, Kapil Thadani, et al. Corpus creation for new genres: a crowdsourced ap- proach to PP attachment [C]//Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, Los Angeles, California. Bremaud. Markov chains: Gibbs fields, montecarlo simulation, and queues. Springer- Verlag. 1999: 13-20.
  • 9Nolan Lawson, Kevin Eustice, Mike Perkowitz, et al. Annotating large email datasets for named entity rec ognition with mechanical turk[C]//Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, Los Angeles, California, 2010 : 13-20.
  • 10Thad Hughes, Daniel Ramage. Lexical Semantic Re- latedness with Random Graph Walk[C]//Joint Con- ference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, June 2007: 581-589.

共引文献18

同被引文献49

  • 1孙基寿.汉字输入编码优劣评测方法的探讨[J].中文信息学报,2006,20(5):97-104. 被引量:8
  • 2刘云浩.群智感知计算[J].中国计算机学会通讯,2012,8(10):38-41.
  • 3余胜泉,杨现民.辨析“积件”“学习对象”与“学习活动”——教育资源共享的新方向[J].中国电化教育,2007(12):60-65. 被引量:30
  • 4Wang J, Kraska T, Franklin M], et al. Crowder: Crowd sourcing entity resolution[J]. Proceedings of the VLDB Endowment, 2012, 5(11): 1483-1494.
  • 5Wang J, Li G, Kraska T, et al. Leveraging transi ti ve relations for crowd sourced joins[C]//Proc of the 2013 Int Conf on Management of Data. New York: ACM. 2013: 229-240.
  • 6Demartini G, Difallah D E, Cudre-Mauroux P. Zen'Crowd , Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking[C]//Proc of the 21st Int Conf on World Wide Web. New York: ACM, 2012: 469-478.
  • 7Karger D R, Oh S, Shah D. Iterative learning for reliable crowdsourcing systems[C]//Advances in Neural Information Processing Systems. La Jolla: NIPS, 2011: 1953-1961.
  • 8Lindley D V. On a measure of the information provided by an experiment[J]. The Annals of Mathematical Statistics, 1956,27: 986-1005.
  • 9Ye P, EDU U M D, Doermann D. Combining preference and absolute judgements in a crowd-sourced setting[C/OL]// Proc of ICML'13 Workshop: Machine Learning Meets Crowd sourcing.[2014-11-10]' http://www. ics. uci, edu/ qliul/MLcrowd_ICML_ workshop/.
  • 10Franklin M J, Kossmann D, Kraska T, et al. CrowdDB: Answering queries with crowdsourcing[C]//Proc of the 2011 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2011: 61-72.

引证文献6

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部