摘要
以哈萨克语通用词汇自动提取为目标,在传统的词语领域使用度的基础上运用改进的词语领域通用度公式进行哈语词汇通用度的计算,使改进的公式对哈语通用词汇的排序位置有更大的影响。基于通用词汇的三大特征:领域通用性、地域通用性、时间通用性,采用统计的方法考察哈语词汇的通用程度,在哈语词频统计的基础上实现了哈语词汇的通用度统计。实验结果表明改进的词语领域通用度计算公式在提取哈语通用词汇时对词语排序位置的影响力度比传统的词语领域使用度计算公式更大。
With automatic extraction of Kazakh common-used words for the goal, use the calculation formula of im- proved words filed general usage calculating lexical general usage of Kazakh common-used words on the basis of traditional words filed usage, enable improved method have greater influence in ranking position of Kazakh com- mon-used words. Based on the three properties of common-used words: filed generality, regional generality, time generality; use statistical methods to investigate the general usage of Kazakh words. On the basis of frequency statis- tics of Kazakh words, implement the statistics of Kazakh lexical general usage. Experimental results show that the improved calculation formula has greater influence strength of words ranking position than the traditional in extract- ing Kazakh common-used words.
出处
《计算机工程与应用》
CSCD
2012年第28期168-173,共6页
Computer Engineering and Applications
基金
国家自然科学基金(No.60763005)
国家教育部
国家语委民族语言文字规范标准建设及信息化科研项目(No.MZ115-92)
关键词
通用词汇
哈萨克语
词汇通用度
领域通用度
时间通用度
common-used words
Kazakh
lexical general usage
filed general usage
time general usage