摘要
主要介绍了面对万维网上各种各样的诸如文本、声音、图形和图像等语料信息,如何按照用户的实际需求将其中对用户有用的信息抽取出来,从而实现对现有语料信息的一种有效分离。重点介绍了Web信息簇聚性的特点和语料库的设计,以及语料库的实际工作原理。
This thesis mainly discusses how to extract the useful information of corpus according to the user's actual requirement from the World Wide Web where there are all kinds of information of corpus such as text,sound,image and picture,etc.By using this method,people can realize the useful extraction from the current existing information of corpus. It emphases the fascination specialty of information in the World Wide Web and the actual working principle of the database of corpus.
出处
《计算机工程》
CAS
CSCD
北大核心
2003年第6期34-35,152,共3页
Computer Engineering