摘要
中日古籍间引用关系的挖掘,对研究中日间词汇、思想、文化等的传播都有重要意义。作为大地语料库中日古籍引用关系挖掘与检索功能开发建设的一环,本文基于该语料库所收中日古代汉文数据,测试五种基于字符串的算法,优选2-gram重叠算法并优化为2-gram相似比等算法。通过该算法,对《论语》全十卷在日本古代汉文中的引用情况进行了全文挖掘。验证了大地语料库数据在中日古籍引用挖掘中的可用性及2-gram相似比算法的有效性,为数字人文下中日古籍的引用挖掘提供实用的数据集与方法参考。
The mining of citation relationships between Chinese and Japanese ancient texts holds significant importance for studying the transmission of vocabulary,ideas,and culture between China and Japan.As part of the development and construction of the citation relationship mining and retrieval function for ancient Chinese and Japanese texts in the DaDi Corpus,this study,based on the data of ancient Chinese texts from both China and Japan collected in this corpus,tests five string-based algorithms.Among them,the 2-gram overlap algorithm is selected as the optimal choice and further optimized into a 2-gram similarity ratio algorithm.This algorithm is then applied to conduct a full-text mining of the citations of The Analects of Confucius by ancient Chinese texts in Japan.The study verifies the usability of DaDi Corpus data in the mining of citations from ancient Chinese and Japanese texts,as well as the effectiveness of the 2-gram similarity ratio algorithm,providing practical datasets and methodological references for the mining of citations from ancient Chinese and Japanese texts within the realm of digital humanities.
作者
熊伟
王鼎
XIONG Wei;WANG Ding
出处
《东北亚外语研究》
2024年第4期46-62,共17页
Foreign Language Research in Northeast Asia
基金
国家社会科学基金重点项目“日本汉字词语料库建设与研究”(19AYY020)
苏州大学2024年“莙政基金”项目“中华经典的异域传播——基于大数据的日本汉诗里的中国典籍引用研究”(苏大教[2024]53号)的阶段性成果。
关键词
引用挖掘
文本相似度
中日古籍
大地语料库
2-gram相似比
mining of citation
text similarity
ancient Chinese and Japanese texts
DaDi Corpus
2-gram similarity ratio