期刊文献+

Web文本中维吾尔语领域术语的自动发现 被引量:1

Automatic identification of Uyghur domain term in Web text
下载PDF
导出
摘要 针对维吾尔语领域术语获取难度大,人工扩充领域术语工作量大、效率低等特点,利用词汇共现原理,以维吾尔语连接词和互信息(MI)为工具,快速扩充原始维吾尔语领域术语;建立了以维吾尔语领域术语为特征模板,利用条件随机场(CRF)模型实现Web文本中维吾尔语领域术语的自动发现方法,并在此基础上实现长维吾尔语领域术语的自动发现。实验表明,对短维吾尔语领域术语的自动发现准确率为97.59%,召回率为93.38%,对长维吾尔语领域术语的自动发现正确率达到55.72%。 Since the Uyghur domain term is difficult to achieve, the workload of artificial expansion of the domain term is tremendous, and the efficiency is low, this paper used the Conditional Random Field (CRF) to identify the Uyghur domain term from the Web texts, which expanded the domain term with the conjunction word and the Mutual Information (MI) between the words based on the co-occurrence of terms. The experiments on the collected Web texts show that, for the short Uyghur domain terms, the algorithm achieves the precision as high as 97.59% and the recall 93.38%, and for the long Uyghur domain terms achieves the precision 55.72%.
出处 《计算机应用》 CSCD 北大核心 2012年第2期407-410,共4页 journal of Computer Applications
基金 国家自然科学基金资助项目(60963017) 国家社科基金资助项目(10BTQ045 11XTQ007)
关键词 维吾尔语 互信息 条件随机场 TF/IDF Uyghur Mutual Information (MI) Conditional Random Field (CRF) Term Frequency/Inverse DocumentFrequency (TF/IDF)
  • 相关文献

参考文献13

  • 1SUI ZHIFANG,CHEN YIRONG.The research on the automatic term extraction in the domain of information science and technology[C]// Proceedings of the 5th East Asia Forum of the Terminology.Haikou:China National Institute of Standardization Press,2007:165-169.
  • 2BOURIGAULT D,JACUEMIN C,L'HOMMM-C.Recent advances in computational terminology[M].Amsterdam:John Benjamins Publishing Company,2001:353-370.
  • 3FORTUNA B,LAVRA(C) N,VELARDI P.Advancing topic ontology learning through term extraction[C]//PRICAI 2008:Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence,LNAI5351.Berlin:Springer-Verlag,2008:626-635.
  • 4BUITELAAR P,OLEJNIK D,SINTEK M.A protégé plug-in for ontology extraction from text based on linguistic analysis[C]//The Semantic Web Research and Applications,LNCS 3053.Berlin:Springer-Verlag,2004:31-44.
  • 5PANTEL P,LIN D.A statistical corpus-based term extractor[C]//Proceedings of 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence:Advances in Artificial Intelligence.Ottawa:[s.n.],2001:36-44.
  • 6周浪,史树敏,冯冲,黄河燕.基于多策略融合的中文术语抽取方法[J].情报学报,2010,29(3):460-467. 被引量:28
  • 7杜波,田怀凤,王立,陆汝占.基于多策略的专业领域术语抽取器的设计[J].计算机工程,2005,31(14):159-160. 被引量:26
  • 8温春,王晓斌,石昭祥.中文领域本体学习中术语的自动抽取[J].计算机应用研究,2009,26(7):2652-2655. 被引量:14
  • 9KAGEURA K,UMINO B.Methods of automatic term recognition:A review[J].Terminology,1996,3(2):259-289.
  • 10QIN LONGZHANG,QIN LU,ZHI FANGSUI.Measuring temthood in automatic terminology extraction[C]// Natural Language Processing and Knowledge Engineering.Piscataway:IEEE,2007:328-335.

二级参考文献43

共引文献72

同被引文献13

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部