摘要
在基于语料库的语言研究实践中,具有语料库分析功能的计算机软件(即语料库分析工具)与语料库一样是必不可少的,并且在大多数情况下,语料库分析工具的各项功能限定了语料库的用途。为了使语料库发挥最大的效用,语料库分析工具应该具有完备的查询、统计、对比等功能。本文梳理了第四代语料库分析工具应该具备的核心功能,并在此基础上分析了国内外语料库分析工具对这些核心功能支持的现状,最后针对目前汉语语料库分析工具核心功能存在的问题提出建议。
In the practice of corpus⁃based language research,computer software with corpus analysis functions(i.e.,corpus analysis tools)is as essential as corpora.In most cases,the corpus analysis tool determines the way to use a corpus.To maximize the values of corpora,therefore,it is necessary for corpus analysis tools to have complete functions of query,statistics,and comparison.In this article,the core functions of corpus analysis tools are categorized as concordance,collocation,frequency statistics,and comparison.Concordance refers to the results returned from querying the texts that match certain query conditions in the corpus,including three types:basic query,multi⁃conditional query,and corpus⁃specific query.Collocation refers to the results returned from querying combinations of words that frequently occur in the corpus and have syntactic or semantic relationships.Frequency statistics show the frequency of words or other linguistic units in the corpus.The comparison function compares the usage of language,including comparing different linguistic phenomena in the same corpus and comparing the same linguistic phenomena in different corpora(or sub⁃corpora).We select six representative fourth⁃generation corpus analysis tools,three from abroad and three from China,and review the four core functions in these tools.Both domestic and foreign corpus analysis tools have implemented diverse concordance functions,and Chinese corpus analysis tools also provide some concordance functions adapted to the characteristics of the Chinese language.In addition,the three foreign corpus analysis tools provide various collocation,frequency statistics,and comparison functions,while the three Chinese corpus analysis tools have nothing more than simple frequency statistics and comparison functions,and none of them supports collocation yet.Finally,based on comparing domestic and foreign corpus analysis tools,this article makes suggestions for these tools.It is suggested that different domestic corpus analysis tools should provide distinctive Chinese corpus analysis functions to complement each other.Domestic corpus analysis tools should:1)further improve or add functions such as collocation,frequency statistics,and comparison;2)increase continuous investment to accelerate software iteration;3)enhance the research on Chinese corpus analysis methods;and 4)promote their transformation into practical tools.It is recommended that both domestic and foreign corpus analysis tools should enhance corpus analysis functions based on deep learning and big data technologies and for multimedia and multimodal corpora.It is expected that this article will help readers comprehensively understand the current status of the core functions of the domestic and foreign fourth⁃generation corpus analysis tools and help further improve these tools,especially Chinese corpus analysis tools.
作者
张永伟
吴冰欣
ZHANG Yongwei;WU Bingxin
出处
《当代语言学》
CSSCI
北大核心
2023年第4期611-624,共14页
Contemporary Linguistics
基金
国家社会科学基金一般项目“融合句法信息的大规模汉语语料库分析工具研制研究”(22BYY086)的资助。