期刊文献+

利用URL-Key领域术语识别方法

Domain Term Extraction Using URL-Key
下载PDF
导出
摘要 首次提出利用URL-Key进行领域术语识别的方法。以URL作为媒介,借助已知URL-Key的领域性来判断未知领域候选术语的领域性。首先,借助互联网中已有的人工分类领域URL,根据URL-Key在各领域汇总使用的频度,采用基于方差的领域URL-Key识别方法,构建领域URL-Key词表;然后,利用伪反馈技术,收集候选领域词检索得到的URL结果集,根据URL结果集构建候选领域术语的URL-Key特征向量;最后,利用SVM对候选领域术语进行提取。在4个领域进行实验,都取得不错的效果。新提出的方法可以有效地解决低频术语识别问题,为低频术语的识别提供新的思路。 A new approach was presented for domain term extraction using URL-Key.With the help of known URL-Key’s domain,unknown URL-Key’s domain can be identified.First,according to the frequency of URL-Key appearing in various fields,a method based on the variance was proposed to identify the domain URL-Key and build the dictionary of domain URL-Key.Then,the pseudo related feedback was used to construct the URL-Key vector of candidate domain terms.Finally,SVM was applied to extract terms.Experiment was conducted on four different domains for Chinese term extraction.Experimental results indicate that the proposed method is quiet effective.In addition,it can effectively solve the recognition problem of low frequency terms,and provides a new way for the identification of low frequency terms.
作者 吕书宁 董志安 LüShuning;DONG Zhian(School of Software Engineering,Beijing University of Technology,Beijing 100124;Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100101)
出处 《北京大学学报(自然科学版)》 EI CAS CSCD 北大核心 2018年第2期262-270,共9页 Acta Scientiarum Naturalium Universitatis Pekinensis
基金 国家自然科学基金(61671070) 国家语言文字工作委员会重点项目(ZDI135-53)资助
关键词 URL URL-Key 领域术语 低频术语 SVM URL URL-Key domain term low-frequency term SVM
  • 相关文献

参考文献3

二级参考文献34

  • 1Broder A, Fontoura M, Gabrilovich E, et al. Robust classification of rare queries using Web knowledge [C] //Proc of ACM SIGIR 2007. New York: ACM, 2007: 231-238.
  • 2Bennett P N, Krysta S, Dumais S T. Classification enhanced ranking [C] //Proe of ACM WWW 2010. New York: ACM, 2010:111-120.
  • 3Ryen W W, Peter B, Chen L. Predicting user interests from contextual information [C]//Proc of ACM SIGIR 2009. New York, ACM, 2009 : 363-370.
  • 4Broder A. A taxonomy of web search [J]. ACM SIGIR Forum, 2002: 36(2): 3-10.
  • 5Shen Dou, Pan Rong, Sun Jiantao, et al. Query enrichment for Web-query classification [J]. ACM Trans on Information Systems, 2006, 24(3): 320-352.
  • 6Li Ying, Zheng Zijian, Dai Honghua. KDD CUP-2005 report, Facing a great challenge [J]. ACM SIGKDD Explorations, 2005, 7(2): 91-99.
  • 7Beitzel S M, Jensen E C, Lewis D D, et al. Automatic classification of web queries using labeled and unlabeledtraining data[J]. ACM Trans on Information Systems, 2007, 25(2) (Article No. 9).
  • 8Li Xiao, Wang Yeyi, Acero A. Learning query intent from regularized click graphs [C] //Proc of ACM SIGIR 2008. New York: ACM, 2008: 339-346.
  • 9Hu Jian, Wang Gang, Fred L, et al. Understanding user's query intent with Wlkipedla [C]//Proc of ACM WWW 2009. New York: ACM, 2009:471-480.
  • 10! Shen Dou, Li Ying, Li Xiao, et al. Product quer1 l classification [C] //Proc of ACM CIKM 2009. New Yorkt / ACM, 2009 : 741-750.

共引文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部