摘要
近年来,日本语言学研究界在数据挖掘技术的应用方面取得了长足进步。相关分析方法不仅被用于词汇、语法研究,而且拓展到了表记、音韵、方言、篇章、语用等众多领域。数据分析工具的功能不断强化。在传统的描述性统计的基础上,研究者逐步导入卡方检验、方差检验、主成分分析、聚类分析、对应分析等推断性统计方法。之后,又尝试使用决策树、随机森林、主题模型、共起网络等基于机器学习的新算法、新技术,有效地提高了数据分析能力。但同时,该领域的研究尚存在使用模式不成熟、特征指标不丰富、专门语料库建设差强人意、知识技能瓶颈有待突破、学科协同意识亟需加强等问题。
Japanese linguists have made significant progress in the application of data mining technology. Besides vocabulary and grammar, these methods are applied to many other areas, such as the study of writing, phonology, dialect, text, and pragmatics. Data analysis tools are continually being improved. Researchers have gradually introduced inferential methods,such as chi-square analysis, variance analysis, principal component analysis, and cluster analysis in addition to traditional descriptive statistics. They then tried using new algorithms and technologies based on machine learning, such as decision trees,random forests, topic models, and co-occurrence networks, which improved their data analysis capabilities greatly. But research in this field still faces several challenges, such as an immature usage model, the need to expand the characteristic indicators, the lack of specialized corpora, the urgent need to break through the knowledge and skills bottleneck, and the urgentneed to promote interdisciplinary collaboration.
作者
毛文伟
梁鹏飞
蒋夏梦
Mao Wenwei;Pengfei Liang;Jiang Xiameng(Shanghai International Studies University,China)
出处
《日语学习与研究》
CSSCI
2022年第6期76-94,共19页
Journal of Japanese Language Study and Research
基金
2019年度国家社会科学基金一般项目“基于数据挖掘技术的中国日语学习者认知机制研究”(批准号:19BYY201)的阶段研究成果
项目主持人:毛文伟。
关键词
日语
数据挖掘
语料库
统计学
机器学习
Japanese linguistics
data mining
corpus
statistics
machine learning