基于Web的语料库建设被引量：2

A Preliminary Research on the Construction of Web Corpus　

下载PDF

导出

摘要对网上中文信息语料库搜集技术的实现原理和关键技术进行了讨论和分析，介绍了基于Ｗｅｂ网络的通讯及网上自动获取信息的原理，讨论了中文信息处理中的分词技术及其发展，提出了一个网上《人民日报》语料库搜集技术的实现方案． With the internet getting increasingly popular in China and the, information in Chinese on WWW becoming ever greater in volume, the importance of automatic data search technique in the Chinese information corpus on the line is more obvious than ever. The development and improvement of the technique is of great significance for bettering the process level of information in Chinese. The present paper, based on a discussion and analysis;of the realization laws and essential technology of data search technique in the Chinese information corpus, attempts to introduce the principles of realizing net communication at the Web and obtaining automatically the information on the line. A scheme of search technique for the corpus of People, s Daily is suggested with the classification and combination technology so far developed in the process of information in Chinese discussed and analyzed.

作者俞倩兰王国新邹永林

机构地区常熟高等专科学校

出处《常熟高专学报》 2000年第2期81-85,共5页 Journal of Changshu College

关键词 WEB 语料库分词中文信息处理搜集技术 Web Curpus dividing words

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

同被引文献13

1卞成林.基于信息处理的汉语工程词研究[J].广西民族大学学报（哲学社会科学版）,1999,22(1):123-127. 被引量：2
2林春实,方燕,全吉成.汉语文献自动分词与标引技术发展浅析[J].情报学报,1997,16(S1):37-40. 被引量：8
3张双圈,周拴龙.汉字信息处理三十年[J].现代图书情报技术,1994(3):49-54. 被引量：1
4邢富坤.Web语料库及其特征初探——与传统语料库的对比研究[J].外语电化教学,2006(2):62-66. 被引量：11
5胡凤国.基于Web检索的语料库资源共享--现状和展望.第二届全国学生计算语言学研讨会论文集,2004.
6Well-known and influential corpora: A survey.http://www lancs.ac.uk/postgrad/xiaoz/papers/corpus%20survey.htm#_T oc92298862.
7The International Corpus of English.http://ice-corpora.net/ ice/index.htm.
8袁泉.谈web服务在数字图书馆信息资源共享中的应用[J].高校情报论坛,2007,:30-33.
9李培峰,朱巧明,钱培德.基于Web的大规模语料库构建方法[J].计算机工程,2008,34(7):41-43. 被引量：8
10王春梅,张银犬.基于P2P技术的个人数字图书馆资源共享策略[J].情报杂志,2008,27(4):125-127. 被引量：17

引证文献2

1刘日升,黄红梅.网络语料库的共享模式与运行机制[J].图书馆学刊,2011,33(12):99-103. 被引量：1
2赵益民.用VFP实现汉语文献的自动分词[J].图书情报工作,2002,46(11):64-66. 被引量：2

二级引证文献3

1柯平,赵益民.从关键词与高频词的相关度看自动标引的可行性[J].情报科学,2009,27(3):326-328. 被引量：8
2陈巍.基于云计算的语料库资源共享平台初探[J].图书馆论坛,2014,34(4):75-80. 被引量：2
3黄水清,王东波,何琳.以《汉学引得丛刊》为领域词表的先秦典籍自动分词探讨[J].图书情报工作,2015,59(11):127-133. 被引量：23

1陶泽邦.广交会“客户信息”搜集技术研究[J].中外企业家,2016(10Z). 被引量：1
2俞倩兰,温晓行.Web语料库建设初探[J].计算机工程,2001,27(5):177-178. 被引量：4
3丛荣华,袁伟.基于结构化信息检索系统的数据搜集技术的研究[J].吉林工程技术师范学院学报,2006,22(12):12-14.
4孟涛,王继民,闫宏飞.网页变化与增量搜集技术[J].软件学报,2006,17(5):1051-1067. 被引量：22
5杜言琦,马军.基于版块的论坛增量搜集策略[J].中文信息学报,2010,24(3):62-68. 被引量：2
6王伟驎,张嘉宝,王树仁.基于RFID技术的射出成形车间模具管理系统[J].中国机械工程,2011,22(1):65-68. 被引量：16

常熟高专学报

2000年第2期

浏览历史

内容加载中请稍等...

基于Web的语料库建设被引量：2

同被引文献13

引证文献2

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于Web的语料库建设 被引量：2

同被引文献13

引证文献2

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于Web的语料库建设被引量：2