摘要
专业领域语料库是对专业领域文献进行自然语言处理的重要的不可或缺的基础,是对专业文本内容与意图进行深层把握的必由之路。通过对研究背景的分析,进一步明析了专业文献进行自然语言处理的必要性,并在对专业文献语料库的研究特点进行分析的基础上,深入探讨了专业语料库的设计思想及原理,同时,对语料库词类的标注信息进行了深入研究。成功地开发了针对专业领域语料库的辅助加工系统,为专业领域语料库建设提供了理论指导和技术支撑。
Domain corpus is essential to the natural language processing for domain documents, especially'for its content and intention analysis. Based on the specific research background, this paper first elaborates the necessity and significance of natural language processing for domain documents. After the analysis on the characteristics of the domain corpus, this paper probes into the design strategy and principle of domain corpus construction. Meanwhile, it also investigates into the part of speech tagging in the corpus. Finally a human-aided processing system for domain corpus is developed, providing some theoretical guidance and technique support for domain corpus construction.
出处
《中文信息学报》
CSCD
北大核心
2008年第4期24-30,共7页
Journal of Chinese Information Processing
基金
国家科技支撑资助项目(2006BAH03B03)
国家973资助项目(2007CB512601)
教育部人文社科资助项目(06JC870001)
山东省中医药科技专项资助项目(2003-14)
关键词
计算机应用
中文信息处理
自然语言处理
语料库
中医药古文献
知识工程
computer application
Chinese information processing
natural language processing
corpus
Chinese traditional medicine document
knowledge engineering