期刊文献+

基于混合方法的维吾尔语词干提取方法研究 被引量:6

Novel approach for Uyghur stemmer using mixed method
下载PDF
导出
摘要 针对维吾尔语形态变化,提出了利用规则和词典相结合的混合处理方法进行形态还原技术。利用从左到右地分析和Lovin算法实现了词干提取器。通过总结词法连接规则,提出了规则实现词干提取、用词典验证提取结果。经过对不同新闻内容的五次测试得出平均准确率达到了77.4%。 This paper proposed changes in morphology of Uygur language, mixed processing method using a combination of rules and dictionaries phase morphology reduction technology. And proposed rules stemming and used a dictionary method to verify the extraction results. It are performed tests on the different combination of features. Experimental results show achieves recall of 77.4%.
出处 《计算机应用研究》 CSCD 北大核心 2015年第1期112-114,120,共4页 Application Research of Computers
关键词 维吾尔语 形态变化 词干 词缀 规则方法 词典方法 混合方法 Lovin算法 Uyghur morphological changes stem affixes rule method dictionary method mixed method Lovin algorithm
  • 相关文献

参考文献10

  • 1The Porter stemming algorithm[ EB/OL]. [ 2014-01-25 ]. http ://tar- tams. org/martin/PorterStemmer/.
  • 2The lancaster stemming algorithm [ EB/OL ]. [ 2014- 01- 21 ] . ht- tp ://www. comp. lancs, ac. uk/computing/research/stemming/.
  • 3The Lovins stemming algorithm[ OL]. [2013-12-21 ]. http ://snow- ball. tartarus, org/algarithms/lovins/stemmer, html.
  • 4DAWSON J L. Suffix removal for word conflation[ J]. Bulletin of the Association for Literary & Linguistic Computing, 1974,2 ( 3 ) : 33- 46.
  • 5MAYFIELD J, MCNAMEE P. Single n-gram stemming[ C ]//Proc of the 26th Annual International Retrieval. New York: ACM Press, 2003:415-416.
  • 6MELUCCI M, ORION. A novel method for stemmer generation based on hidden Markov models[ C ]//Proc of the 12th International Confer- ence on Information and Knowledge Management. New York : ACM, 2003:131-138.
  • 7AISHA B, SUN Ma-song. A statistical method for uyghur tokenization [ C ]//Proe of IEEE International Conference on NLP- KE. 2009 : 383- 387.
  • 8AISHAN W, TUERGEN Y, ZAOKERE K. Shengwei tian conditional random fields combined FSM stemming method for uyghur proceeding [ C ]//Proc of the 2nd IEEE International Confrence on Computer and Information Technology. 2009 : 295- 299.
  • 9早克热.卡德尔,艾山.吾买尔,吐尔根.依布拉音,艾斯卡尔.艾木都拉.维吾尔语名词构形词缀有限状态自动机的构造[J].中文信息学报,2009,23(6):116-121. 被引量:20
  • 10阿依克孜.卡德尔,开沙尔.卡德尔,吐尔根.依布拉音.面向自然语言信息处理的维吾尔语名词形态分析研究[J].中文信息学报,2006,20(3):43-48. 被引量:23

二级参考文献22

  • 1白锡嘉.机器翻译与自然语言的理解[J].中国科技翻译,1996,9(2):31-34. 被引量:7
  • 2古丽拉.阿东别克,米吉提.阿布力米提.维吾尔语词切分方法初探[J].中文信息学报,2004,18(6):61-65. 被引量:39
  • 3力提甫.托乎提.电脑处理维吾尔语语音和谐律的可能性[J].中央民族大学学报(哲学社会科学版),2004,31(5):108-113. 被引量:14
  • 4周强.规则和统计相结合的汉语词类标注方法[J].中文信息学报,1995,9(3):1-10. 被引量:43
  • 5阿依克孜.卡德尔,开沙尔.卡德尔,吐尔根.依布拉音.面向自然语言信息处理的维吾尔语名词形态分析研究[J].中文信息学报,2006,20(3):43-48. 被引量:23
  • 6L. S. Larkey, L. Ballesteros and M. E. Connell. Improving Stemming for Arabic Information Retrieval: Light Stemming and Co-occurrence Analysis[C]//Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, Tampere, Finland,2002, 275-282.
  • 7Tai, S. Y., Ong, C. S., and Abdullah, N. A. On designing an automated Malaysian stemmer for the Malay language(poster) [C]//Proeeedings of the fifth international workshop on information retrieval with Asian languages, Hong Kong, 2000: 207-208.
  • 8Greengrass, M., Robertson, A. M., Robyn, S., and Willett, P. Processing morphological variants in searches of Latin text [J]. Information research news, 1996, 6(4): 2-5.
  • 9Berlian, V., Vega, S. N., and Bressan, S. Indexing the Indonesian web: Language identification and miscellaneous issues[C]//Presented at Tenth International World Wide Web Conference, Hong Kong, 2001.
  • 10Carlberger, J., Dalianis, H., Hassel, M., and Knutsson, O. Improving precision in information retrieval for Swedish using stemming[C]//Proceedings of NO- DALIDA'01-13th Nordic conference on computational linguistics, Uppsala,Sweden, 2001.

共引文献37

同被引文献62

引证文献6

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部