摘要
基于网络检索的语料库研究均开始于语料库软件系统的开发,语料库软件系统是从事语料库语言学、机器翻译、语言教学、词典编纂等研究的基础,软件系统的质量决定了语料库建设规模的大小和研究成果的优劣。大规模语料库软件系统建设的关键环节包括:文档抽取;元数据建立;词性、句法和语误标注;索引、检索和统计分析。针对上述技术环节,我们收集并编程测试了大量国外语料库开发软件包,从软件实现的理论方法、执行效率、准确率、鲁棒性、实用性、支持中文等多个方面进行分析和评述,以期对国内大规模语料库软件系统的建设提供借鉴和帮助。
The study of corpus software system based on network retrieval was all launched out with the development of corpus software system. The corpus software system plays as the foundational stone in the building of the studies on corpus linguistics, machine translation, language teaching and lexicography.The system's quality formulates the scale of corpus construction and the outputs of the studies as well.The construction of large-scale corpus software system, whose key links include: document extraction;Metadata set up; the part of speech, syntax and miss labeling; indexing, retrieval and statistical analysis.According to the technologies above, we analyzed and commented the corpus development package fromvarious of aspects, like the theory method, execution efficiency, accuracy, robustness and practicability,weather support Chinese and so on, by means of a large amount of foreign corpus development packagecollection and programming tests. We do it for the reason that we may provide a reference or a little help for the construction of domestic large-scale corpus software system later on.
出处
《情报科学》
CSSCI
北大核心
2014年第11期147-151,共5页
Information Science