摘要
对网上中文信息语料库搜集技术的实现原理和关键技术进行了讨论和分析,介绍了基于Web网络的通讯及网上自动获取信息的原理,讨论了中文信息处理中的分词技术及其发展,提出了一个网上《人民日报》语料库搜集技术的实现方案.
With the internet getting increasingly popular in China and the, information in Chinese on WWW becoming ever greater in volume, the importance of automatic data search technique in the Chinese information corpus on the line is more obvious than ever. The development and improvement of the technique is of great significance for bettering the process level of information in Chinese. The present paper, based on a discussion and analysis;of the realization laws and essential technology of data search technique in the Chinese information corpus, attempts to introduce the principles of realizing net communication at the Web and obtaining automatically the information on the line. A scheme of search technique for the corpus of People, s Daily is suggested with the classification and combination technology so far developed in the process of information in Chinese discussed and analyzed.
出处
《常熟高专学报》
2000年第2期81-85,共5页
Journal of Changshu College