摘要
以字为切分单位的BERT预训练模型在实体识别任务中表现优异,但其忽略粗粒度的领域词汇作为整体的语义表示信息,对于教育领域课程文本中存在大量嵌套实体的识别效果不佳。针对上述问题,提出动态融合字、词级别词向量的LEBERT-CRF教育领域课程文本实体识别方法,利用词典适配器将领域词典特征高效融入BERT模型中,以提升BERT模型对实体词边界的识别效果,更好地适应课程知识抽取任务。结果表明,LEBERT-CRF模型相较其他主流的实体识别模型表现更好,F1达到95.47%。
The BERT pre training model,which uses words as segmentation units,performs well in entity recognition tasks,but it ignores coarse⁃grained domain vocabulary as the overall semantic representation information,is not effective for recognizing a large number of nested entities in educational curriculum texts.To address the above issues,a dynamic fusion word and word level word vectors for LEBERT⁃CRF education domain course text entity recognition method is proposed.In the method,the dictionary adapter is used to efficiently integrate domain dictionary features into the BERT model and thereby improving the recognition effect of the BERT model on entity word boundaries and better adapting to course knowledge extraction tasks.The experimental results indicate that the LEBERT⁃CRF model performs better than other mainstream entity recognition models do,with an F1 of 95.47%.
作者
侯敏
高茂
张丽萍
闫盛
赵宇博
HOU Min;GAO Mao;ZHANG Liping;YAN Sheng;ZHAO Yubo(College of Computer Science and Technology,Inner Mongolia Normal University,Hohhot 010022,China)
出处
《内蒙古师范大学学报(自然科学版)》
CAS
2024年第2期197-206,共10页
Journal of Inner Mongolia Normal University(Natural Science Edition)
基金
内蒙古自治区自然科学基金资助项目“利用软件演化历史识别与推荐重构克隆”(2018MS06009)
内蒙古自治区自然科学基金联合资助项目“面向编程教育个性化学习的智能教育服务关键技术研究”(2023LHMS06009)
内蒙古自治区哲学社会科学研究专项资助项目“基于知识图谱的课程知识智能问答系统”(ZSZX21102)。