摘要
在继承以往语料库分析软件优点的基础上,本研究开发了具有独立知识产权的PowerConc语料库分析工具。PowerConc对传统的词汇索引、词表生成、主题词计算等功能进行了重构、扩展和优化。整个软件以基于正则表达式(regular expressions)的N元组(N-gram)为基础。二者的有机结合即本文所提出的R-gram。R-gram这一概念大大增强了检索和匹配的灵活性。同时我们设计了兼容正则表达式的简易输入语法——Smart Input,降低了用户使用的难度,提高了软件的易用性。PowerConc软件基于面向对象的思想开发,核心功能被封装在不同的类中,与界面分离,具有很好的扩展性和可维护性。PowerConc的开发将有效促进语料库语言学研究的开展。
This paper describes the innovative corpus tool PowerConc developed by the authors themselves. In its implementation, such key functionalities of corpus tools as concordancing, wordlist generation and keyword analysis, were redesigned, expanded and optimized. What underlies the whole design of PowerConc is the inventive synergy of regular expressions and N-gram, namely, R-gram proposed in this study. The R-gram feature allows for flexibility in concordancing, exhaustive listing of linguistic units, and key terms of varying length, and more likely than not, enables analyses of linguistic structures with uncertain words or categories. To minimize the inconvenience of operation, Smart Input has been introduced to facilitate easy search with enhanced returned hits. The user-friendly PowerConc is object- oriented software. The key functions have been packaged in different classes as a dynamic link library ( *. dll) file, independent from the user interface, which warrants easy maintenance and expandability. It is hoped that PowerConc will be conducive to corpus-based research in its own way.
出处
《外语电化教学》
CSSCI
北大核心
2013年第1期57-62,共6页
Technology Enhanced Foreign Language Education
基金
国家社科基金项目“基于双语语料库的汉语复杂动词结构英译研究”(项目编号:12CYY060)
教育部“新世纪优秀人才支持计划”(项目编号:NCET-12-0790)的资助