摘要
为解决纸质图书存在的无法快速定位知识概念、从字面难以把握教科书写作的逻辑结构和难以建立知识间的关联等问题,提出了一种结合大语言模型的教科书语步识别方法。首先,设计教科书语步结构,构建教科书语步分类数据集;然后,利用生成式大语言模型分别对稀缺语步和无明显特征语步进行语料生成和特征增强;最后,结合语步识别数据集和增强后语步数据,微调教科书语步识别初始模型,得到结合大语言模型的教科书语步识别模型。实验结果表明,与初始模型BERT-wwm-ext相比,经过大语言模型辅助的语步识别模型总体准确率提升5.06百分点,达到95.44%,Macro-F1值提升2.54百分点,达到93.51%。利用该语步识别模型自动构建了教科书知识图谱及书后索引,较清晰地展现了教科书写作的逻辑结构。
To solve the problems existing in printed textbooks,such as the inability to quickly locate knowledge concepts,the difficulty to grasp the logical structure of textbook writing literally,and the difficulty to establish the correlation between knowledge,textbook moves recognition method facilitated by large language model was proposed.Firstly,textbook move structure was designed and a dataset for textbook move classification was constructed.Then,a generative large language model was used to generate corpus and enhance features for scarce and indistinct steps,respectively.Finally,by combining the move recognition dataset and enhanced move data,the initial model of textbook move recognition was fine-tuned to obtain a textbook move recognition model that combines the large language model.The experimental results show that compared with the initial model BERT-wwm-ext,the overall accuracy of the move recognition model facilitated by the large language model has increased by 5.06 percentage points,reaching 95.44%,and the Macro-F1 value has increased by 2.54 percentage points,reaching 93.51%.Furthermore,the move recognition model was utilized to construct a knowledge graph and an after-book-index,effectively elucidating the logical structure of textbook with heightened clarity.
作者
王润欣
李宁
WANG Runxin;LI Ning(Computer School,Beijing Information Science&Technology University,Beijing 102206,China)
出处
《北京信息科技大学学报(自然科学版)》
2024年第4期71-80,共10页
Journal of Beijing Information Science and Technology University
基金
国家自然科学基金项目(61672105)。
关键词
数字教材
语步识别
大语言模型
知识图谱
书后索引
digital textbook
move recognition
large language model
knowledge graph
after-book-index