摘要
[目的/意义]针对藏医古籍知识组织与开发不足的问题,利用混合深度学习方法构建面向藏医古籍的命名实体识别模型,为藏医古籍知识的深度开发与利用提供方法支持。[方法/过程]根据藏医古籍知识特点,构建ALBERT-BiLSTM-CRF模型。以《四部医典》为数据集,在人工标注与文本预处理的基础上,进行命名实体识别实验,并将实验结果与其他3种常见模型进行对比分析。[结果/结论]ALBERT-BiLSTM-CRF模型对藏医古籍实体识别效果最好,F1-score达到96.28%,与其他方法相比提升约7个百分点。
[Purpose/Significance]In view of the lack of organization and utilization of the knowledge of ancient books of Tibetan medicine,a Named Entity Identification model for ancient books of Tibetan medicine was proposed to provide the basis and support for the in-depth mining of knowledge of ancient books of Tibetan medicine.[Method/Process]Based on the data set of the ancient Tibetan medical books“The Four Medical Tantras”,on the basis of manual annotation and text pre-processing,ALBERT-BERT-BILSTM-CRF,BERT-BILSTM-CRF,BILSTM-CRF and BERT were used to carry out named entity recognition experiments,and the experimental results were compared and analyzed.[Results/Conclusion]The F1-score of ALBERT-BERT-BILSTM-CRF model entity recognition reached 96.28%,which is about 7 percentage points higher than other methods.
作者
刘佳
边俊伊
Liu Jia;Bian Junyi(School of Business and Management,Jilin University,Changchun 130012,China)
出处
《现代情报》
CSSCI
2023年第11期37-46,共10页
Journal of Modern Information
基金
教育部人文社会科学研究规划基金项目“基于数据生态的图书馆知识服务价值共创的演化机制、模拟实验及优化研究”(项目编号:19YJA870007)。