期刊文献+

基于字词重复模式及错字率的中文词组语料校对策略

Chinese Phrases Corpus Proofreading Strategies Based On Words Repeat Patterns and Typo Rate
下载PDF
导出
摘要 经过统计发现在中文词组语料中具有字词重复特性的词组具有较高的错字率。对词组的字训重复模式进行了分类统计,统计了不同重复模式的出错率。了解到高错字率的重复模式。比如重复字词出现在词组尾部,或者出现连续性重复,则出错概率较大。基于字词重复模式的出错率数据,推荐了两种对人肌模词组语料进行人工校对的优化策略。 Statistics found that in the Chinese phrase corpus, phrase having repeated words has a high typo rate. The patterns of words repeat are classified, which indicated the error rates of different repeat patterns classified statistics. And according to the data, we learnt the repeat patterns which has those high error rates. If the repeated word appears in the phrase tail, or if there is a continuous repetition, the error probability will be higher. This paper recommend two large-scale artificial optimization strategies of proofreading the phrase corpus, based on the data of typo rate of words repeat patterns.
出处 《教学与科技》 2014年第4期38-42,共5页 Teaching and Science Technology
关键词 中文词组语料 校对策略 字词重复模式 错字率 Chinese phrase corpus Proofreading strategies Words repeat patterns Typo rate
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部