摘要
随着数字化转型时代的到来,语料库已日益成为支撑各领域文本挖掘的重要基础资源。它既能为不同领域提供丰富的语言素材,也能为该领域的研究和应用作出一定的贡献。标准是重要的基础性战略资源,在核心产业的高质量发展中发挥支撑性和引领性的作用,因此构建面向核心产业的国家标准语料库具有重大的战略意义。在众多产业中,集成电路是信息时代的“心脏”。本研究构建了面向集成电路国家标准的单模态多粒度语料库(ICNSC),并对其开展初步分析,为集成电路产业的科技智库建设提供了一定的基础资源。
With the advent of the digital era, corpus has served as a fundamental resource for text mining across a wide range of fields due to its abundance in language materials and its contributions to research and application.Standards are important basic strategic resources, supporting and leading the high-quality development of core industries. Therefore, the establishment of core-industries-oriented standards corpus is of great strategic significance. Among a majority of industries, integrated circuit(IC) has been regarded as the heart to the Information Age. The study has constructed a single-mode and multi-granularity corpus for national standards for integrated circuits and conducted a preliminary analysis of it, providing a basic resource for the establishment of science and technique think bank.
作者
方思怡
夏磊
FANG Si-yi;XIA Lei(Shanghai Institute of Quality and Standardization)
出处
《标准科学》
2022年第11期38-43,60,共7页
Standard Science
关键词
标准语料库
集成电路
国家标准
语料分析
standards corpus
integrated circuit
national standard
corpus analysis