摘要
数字人文作为一门交叉学科,其强调计算技术与人文学科融合发展。古汉语典籍是人文学科研究中重要的一部分,在此背景下,利用计算机技术对数字化后的《春秋经传》典籍进行关键词抽取探究,从而分析春秋经传的关键词分布情况。本文利用了三种关键词抽取算法,分别是基于无监督的TextRank算法、经典传统TF-IDF算法和LDA主题模型算法。基于Pooling的评价方法发现TextRank算法抽取的关键词结果更好,准确率达到84%。传统的TF-IDF算法和LDA主题模型算法准确率分别为62%和74%。同时,根据所抽取的关键词,可以发现春秋经传的记事内容主要围绕在诸侯国之间的聘问、会盟、征伐、婚丧、篡弑等。
As an interdisciplinary subject,Digital Humanities emphasizes the integration and development of computing technology and humanities.Ancient Chinese classics is an important part of the study of humanities.In this context,we use computer technology to extract keywords from the digitized classics of the Spring and Autumn period,so as to analyze the distribution of keywords in the classics of the Spring and Autumn period.In this paper,three keyword extraction algorithms are used,which are based on unsupervised textrank algorithm,traditional TF-IDF algorithm and LDA topic model algorithm.Based on evaluation method of pooling,it is found that textrank algorithm can extract better keywords with an accuracy of 84%.The accuracy of traditional TF-IDF algorithm and LDA topic model algorithm is 62%and 74%respectively.At the same time,according to the keywords drawn out,we can find that the chronicles of the Spring and Autumn period mainly focus on the interrogation,alliance,expedition,marriage and funeral,usurpation and killing among the vassal states.
作者
秦贺然
王东波
Qin Heran;Wang Dongbo(Lianyungang Higher Vocational Technical College Traditional Chinese Medicine,Modern technology education center Library;College of Information Science and Technology,Nanjing Agricultural University)
出处
《图书馆杂志》
CSSCI
北大核心
2020年第11期97-105,共9页
Library Journal
基金
国家自然科学基金面上项目“基于典籍引得的句法级汉英平行语料库构建及人文计算研究”(项目编号:71673143)
国家社科基金重大项目“基于《汉学引得丛刊》的典籍知识库构建及人文计算研究”(项目编号:15ZDB127)的研究成果之一。