期刊文献+

从助词标注看汉语分词软件的问题

On the Problems of Parsing Software Exposed from Auxiliary Tagging
下载PDF
导出
摘要 以中文分词标注软件CorpusWordParser对助词标注的结果为研究对象,总结出六种标注错误类型:动词标记为助词、名词标记为助词、量词标记为助词或介词、代词标记为助词、助词标记为动词、助词标记为形容词。同时,比较了同类型的分词标注软件“ICTCLAS”,发现“ICTCLAS”分词和标记词性的正确率高于“CorpusWordParser”,且两款软件存在共同的标记错误之处。最后,根据存在的共同错误类型,提出了相应的改进建议。汉语分词、标记技术研究以及方法研究等方面还有待改进和提升,人工检查在研究中必不可少,加强人工检查,能够提高词性标记的准确率。 Based on the result of auxiliary tagging by the Chinese parsing software of CorpusWordparser, six types of tagging errors are summarized, namely wrong tagging of verbs, nouns, pronouns by auxiliaries, classifiers by auxiliaries or prepositions, as well as auxiliaries by verbs and adjectives. When ICTCLAS, a similar type of software, is introduced for comparison, although it has higher rate of correct tagging than CorpusWordParser, the two software still share common errors of tagging. Finally, corresponding advice for improvement and revision of the software are put forward according to the common error types. Since the problems of Chinese parsing and tagging technology and approaches are yet to be solved, human check is necessary in research in that it can ensure the rate of correct tagging.
作者 郭康平 冯莉 GUO Kang-ping;FENG Li(College of Arts,Heilongjiang University;College of Applied Foreign Languages,Heilongjiang University,Harbin Heilongjiang 150080)
出处 《牡丹江大学学报》 2023年第2期37-44,共8页 Journal of Mudanjiang University
基金 国家社科项目“基于语料库的当前我国立法语言研究”(项目编号:21AYY012)。
关键词 CorpusWordParser 助词 标注 分词 CorpusWordParser auxiliary word tagging parsing
  • 相关文献

参考文献3

二级参考文献9

共引文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部