摘要
[目的/意义]构建一套面向汉语非母语学习者的专业词表对专业学习和国际中文教育学科建设及发展具有重要意义。[方法/过程]针对当前外向型专业词表较少及构建方法单一问题,本文首先从网站爬取小说、新闻和论坛留言构建参照语料库,根据教育部专业课程设置目录,选取专业教材构建专业教材语料库,运用TF-IDF-TF算法遴选专业主题词并构建词共现矩阵,利用凝聚聚类法实现专业主题词聚类。在此基础上,计算词簇内主题词的语义相关性,选取语义共现度最大的词作为词簇中心词,并根据语义相关性编排词表。最后,以经济学类专业为例构建面向留学生的专业主题词表。[结果/结论]结果表明,本文构建的经济类专业主题词表能够较好地提取专业词汇且有效地将语义关联度紧密的专业主题词聚类在同一词簇内,学习者能够快速有效获取相关词簇进行专业自适应学习,并为其他专业主题词表的构建提供了依据。
[Purpose/Significance]Building a specialized word list for non-native Chinese learners is of great significance for specialized learning and the construction and development of International Chinese Language Education discipline.[Methods/Processes]In response to the current shortage of Chinese specialized word list for foreign learners and the single construction method,this paper first crawls novels,news,and forum comments from websites to construct a reference corpus.Based on the specialized curriculum directory of the Ministry of Education,textbooks are selected to construct a corpus of specialized textbooks.Algorithms are used to select specialized subject words and construct a word co-occurrence matrix.Cohesive clustering is used to achieve subject words clustering.On this basis,calculate the semantic correlation of the subject words within the word cluster,select the word with the highest semantic co-occurrence as the central word of the word cluster,and arrange the word list based on the semantic correlation.Finally,taking economics major as an example,a specialized subject word list for foreign students is constructed.[Results/Conclusions]The results showed that the economic subject word list constructed in this paper can greatly extract the specialized vocabulary,and effectively cluster closely related specialized subject words within the same word cluster.Learners can quickly and effectively obtain relevant word clusters for adaptive learning.What’s more,this method also provides a basis for the construction of other subject word list as well.
作者
杭建琴
张鸣宇
胡泽文
HANG Jianqin;ZHANG Mingyu;HU Zewen(Research Centre for Language and Language Education,Central China Normal University,Wuhan 430079,China;School of International Education,Wuhan University,Wuhan 430079,China;School of Management Science and Engineering,Nanjing University of Information Science&Technology,Nanjing 210044,China)
出处
《情报工程》
2024年第3期114-127,共14页
Technology Intelligence Engineering
基金
国家社会科学基金项目“面向海量科技文献的潜在‘精品’识别方法与应用研究”(20CTQ031)
国家社会科学基金一般项目“鄂西北四省市过渡地带方言语法调查与比较研究”(20BYY039)
江苏省高校哲学社会科学一般项目“‘一带一路’国家来华留学生学习焦虑情绪对汉语学习的影响及对策研究”(2023SJYB0753)。
关键词
主题词表
凝聚聚类算法
语义共现度
词簇中心词
Subject Word List
Cohesion Clustering Algorithm
Semantic Co-occurrence
Central Word of the Word Cluster