期刊文献+

面向中医古籍的单篇文本知识标引与结构解析技术 被引量:1

Knowledge Indexing and Structural Analysis Techniques for Single Text of Ancient Chinese Medical Books
原文传递
导出
摘要 [目的/意义]在无标注资源的情况下,对中医古籍文本领域的分词和命名实体识别技术进行研究,基于分词与命名实体识别模型,对中医领域文本进行分词并进行语言模型的训练。[方法/过程]在训练过程中,研究采用实体概念排序预测与遮罩词预测的多任务学习框架,有效将词典中的先验概念知识融入到语言模型中,得到融合语篇语义与先验知识的语言模型。从模型训练中使用的MLM任务出发,设计基于完形填空类型的文本生成任务来进行单篇古籍文本的知识标引,以短句一实体为路径,遍历单篇文本中所有的短句并进行知识概念的全标引,并基于先验规则的挖掘,从单篇文本中发现隐性知识结构,从而构建隐性篇章结构。[结果/结论]对比实验显示,在仅有5个标注样本的情况下,研究提出的文本标引方式能够有效利用模型的先验知识;相较于传统方法,能更好地解决标注缺失情况下的中医古籍文本知识标引的问题,为进一步实现中医古籍单篇文本的解析提供解决方法。对中医古籍进行整理、校注,挖掘其中蕴含的知识,对中医学与现代医学的发展,以及医学史的研究都有重要的理论与现实意义。 [Purpose/Significance]The technique of word separation and named entity recognition in the field of ancient Chinese medical texts is investigated without annotated resources,based on the word separation and named entity recognition model,the word separation is carried out and the language model is trained for the text in the field of Chinese medicine.[Method/Process]In the training process,the study used a multi-task learning framework of entity concept ranking prediction and masked word prediction to effectively incorporate prior conceptual knowledge from the lexicon into the language model,and obtain a language model that integrates the semantics of the discourse with prior knowledge.Starting from the MLM task used in the model training,a text generation task based on the completion type was designed to perform knowledge citation of a single ancient text,traversing all the phrases in the single text and performing full citation of knowledge concepts based on the phrase-entity path,and discovering the implicit knowledge structure from the single text based on the mining of a priori rules to construct the implicit chapter structure.[Result/Conclusion]The comparative experiments show that the proposed text citation approach can ef-fectively utilize the model's prior knowledge in only five annotated samples,which can well solve the problem of knowledge citation of ancient Chinese medical texts in the absence of annotation compared with the traditional meth-od,and provide a solution for further realization of the parsing of single texts of ancient Chinese medical texts.It is of great theoretical and practical significance for the development of traditional Chinese medicine and modern medi-cine,as well as the research of medical history to sort out,proofread and annotate ancient Chinese medicine books and dig out the knowledge contained in them.
作者 刘耀 李冠霖 李浣青 Liu Yao;Li Guanlin;Li Huanqing(Institute of Scientific and Technical Information of China Beijing,Beijing 100038;Samovar,Telecom SudParis,Institut Polytechnique de Paris,France 91120;School of Sofware and Microelectronics,Peking University,Beijing 100871)
出处 《图书情报工作》 CSSCI 北大核心 2022年第24期118-127,共10页 Library and Information Service
基金 国家社会科学基金项目“数字资源知识共享与知识再利用模式与方法研究”(项目编号:21BTQ011) 国家重点研发计划“数据驱动的科技咨询服务平台建设”(项目编号:2018YFB143502)研究成果之一。
关键词 单篇文本知识结构解析 知识标引 先验知识 词微调语言模型 实体概念识别 single text knowledge structure parsing knowledge priming priorknowledge word fine-tuning language models entity concept identification
  • 相关文献

参考文献6

二级参考文献36

共引文献65

同被引文献11

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部