期刊文献+

问答社区问句中多字词表达提取

Extraction of Multiword Expressions in Questions of Question Answering Communities
下载PDF
导出
摘要 基于互动问答社区问句中多字词表达和问句理解的关系,提出针对互动问答社区问句进行多字词表达抽取,并基于互动问答社区问句中多字词表达的特点,提出适用于互动问答社区的多字词表达提取方法.该方法在利用互信息和停用词表的方法从问句中抽取候选多字词表达的基础上,将候选多字词表达分为正确串、残缺串、冗余串和错误串4类,借助搜索引擎对查询串的优化和候选多字词表达在互联网上的检索结果,设计候选多字词表达校正方法,实现对多字词表达的提取.以新浪爱问知识人问题库中的问句进行实验,结果表明,多字词表达抽取的准确率、召回率和F值分别达到84%,52%和0.64,验证了该方法的有效性. The multiword expressions (MWEs)in the questions of question answering communities have direct relationship with question interpretation.We first proposed the idea of extracting MWEs from the questions of question answering communities.According to the characteristics of multiword expressions in the questions,we proposed a method of extracting MWEs in questions of question answering communities.In this method,we first used mutual information method and stop words filtering method to get the candidate MWEs.Then we classified the candidate MWEs into four types:right string,incomplete string,redundancy string and error string.At last,with the help of query optimization in search engines and the candidate MWEs retrieval results on the internet,we designed a revising method to get the MWEs.We took the questions in Sina iask question library as the experimental corpus.And the results show that the precision,recall and the F-measure can reach 84%,52%,0.64 respectively,which proves the effectiveness of the proposed method.
出处 《吉林大学学报(理学版)》 CAS CSCD 北大核心 2014年第6期1230-1238,共9页 Journal of Jilin University:Science Edition
基金 国家自然科学基金(批准号:61171159 61271304) 北京市教委科技发展计划重点项目暨北京市自然科学基金B类重点项目(批准号:KZ201311232037)
关键词 多字词表达 问句理解 互信息 搜索引擎 multiword expressions question interpretation mutual information search engine
  • 相关文献

参考文献13

  • 1刘荣,王丽娟,张志平,赵小兵.利用高频词和互信息面向特定领域提取多字词表达[J].太原理工大学学报,2009,40(3):210-214. 被引量:4
  • 2Sag I A,Baldwin T, Bond F,et al. Multiword Expressions: A Pain in the Neck for NLP [C]//Proceedings of theThird International Conference on Computational Linguistics and Intelligent Text Processing. Berlin: Springer,2002: M5.
  • 3Kenneth W C,Hanks P. Word Association Norms,Mutual Information and Lexicography (rev) [J]. ComputLinguist, 1990,16(1) . 22-29.
  • 4Pecina P. A Machine Learning Approach to Multiword Expression Extraction [C]//Proceedings of the LREC 2008Workshop towards a Shared Task for Multiword Expressions. Marrakech,Morocco: [s. n. ],2008: 54-57.
  • 5Aline V,Kordoni V,ZHANG Yi,et al. Validation and Evaluation of Automatically Acquired MultiwordExpressions for Grammar Engineering [C]//Proceedings of the 2007 Joint Conference on Empirical Methods inNatural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Prague, Chech:[s. n. ],2007: 1034-1043.
  • 6Ramisch C,Schreiner P, Idiart M,et al. An Evaluation of Methods for the Extraction of Multiword Expressions[C]//Proceedings of the LREC 2008 Workshop towards a Shared Task for Multiword Expressions. Marrakech,Morocco; [s. n. ],2008 ; 50-53.
  • 7Al-Haj H, Wintner S. Identifying Multi-word Expressions by Leveraging Morphological and SyntacticIdiosyncrasy [C]//Proceedings of the 23rd International Conference on Computational Linguistics. Beijing: IEEE,2010; 10-18.
  • 8Tsvetkov Y,Wintner S. Identification of Multi-word Expressions by Combining Multiple Linguistic InformationSources [C]//Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing.Edinburgh, England: [s. n. ],2011 : 836-845.
  • 9Fazly A, Stevenson S. Automatically Constructing a Lexicon of Verb Phrase Idiomatic Combinations [C]//Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics(EACL). Trento, Italy: [s. n. ],2006 : 337-344.
  • 10DUAN Jianyong, ZHANG Mei, TONG Lijing, et al. A Hybrid Approach to Improve Bilingual MultiwordExpression Extraction [ C]//Advances in Knowledge Discovery and Data Mining. Berlin: Springer,2009 :541-547.

二级参考文献7

共引文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部