针对学术论文在学科领域内进行层次标签分类问题,提出了一种基于知识增强的语义表示与图注意力网络的文本层次标签分类(text hierarchical label classification based on enhanced representation through knowledge integration and g...针对学术论文在学科领域内进行层次标签分类问题,提出了一种基于知识增强的语义表示与图注意力网络的文本层次标签分类(text hierarchical label classification based on enhanced representation through knowledge integration and graph attention networks, GETHLC)模型。首先,通过层次标签抽取模块提取学科领域下层次标签的结构特征,并通过预训练模型对学术论文的摘要、标题和抽取后的层次标签结构特征进行嵌入;然后,在分类阶段基于层次标签的结构分层构造层次分类器,将学术论文逐层分类至最符合的类别中。在大规模中文科学文献数据集CSL上进行的实验结果表明,与基准的ERNIE模型相比,GETHLC模型的准确率、召回率和F1值分别提升了5.78、4.31和5.02百分点。展开更多
A bacterial cell surface display technique based on an ice nucleation protein has been employed for the development of live vaccine against viral infection. Due to its ubiquitous ability to invade host cells, Salmonel...A bacterial cell surface display technique based on an ice nucleation protein has been employed for the development of live vaccine against viral infection. Due to its ubiquitous ability to invade host cells, Salmonella typhimurium might be a good candidate for displaying viral antigens. We demonstrated the surface display of domain III of Japanese encephalitis virus E protein and the enhanced green fluorescent protein on S. typhimurium BRD509 using the ice nucleation protein. The effects of the motif in the ice nucleation protein on the effective display of integral protein were also investigated. The results showed that display motifs in the protein can target integral foreign protein on the surface of S. typhimurium BRD509. Moreover, recombinant strains with surface displayed viral proteins retained their invasiveness, suggesting that the recombinant S. typhimurium can be used as live vaccine vector for eliciting complete immunogenicity. The data may yield better understanding of the mechanism by which ice nucleation protein displays foreign proteins in the Salmonella strain.展开更多
【目的】探究ERNIE模型(Enhanced Language Representation with Informative Entities)和双向门限循环单元(Bi GRU)在医疗疾病名称科室分类中的效果及差异。【方法】以医疗疾病名称为训练样本,以BERT(Bidirectional Encoder Representa...【目的】探究ERNIE模型(Enhanced Language Representation with Informative Entities)和双向门限循环单元(Bi GRU)在医疗疾病名称科室分类中的效果及差异。【方法】以医疗疾病名称为训练样本,以BERT(Bidirectional Encoder Representation from Transformers)为对比模型并在模型之后加入不同网络层进行训练探究。【结果】ERNIE模型在分类效果上优于BERT模型,精度约高4%,其中精确度可达79.48%,召回率可达79.73%,F1分数可达79.50%。【局限】仅对其中的八个科室进行分类研究,其他类别由于数据量过少而未纳入分类体系中。【结论】ERNIE-BiGRU分类效果较好,可应用于医疗导诊系统或者卫生统计学中。展开更多
文摘针对学术论文在学科领域内进行层次标签分类问题,提出了一种基于知识增强的语义表示与图注意力网络的文本层次标签分类(text hierarchical label classification based on enhanced representation through knowledge integration and graph attention networks, GETHLC)模型。首先,通过层次标签抽取模块提取学科领域下层次标签的结构特征,并通过预训练模型对学术论文的摘要、标题和抽取后的层次标签结构特征进行嵌入;然后,在分类阶段基于层次标签的结构分层构造层次分类器,将学术论文逐层分类至最符合的类别中。在大规模中文科学文献数据集CSL上进行的实验结果表明,与基准的ERNIE模型相比,GETHLC模型的准确率、召回率和F1值分别提升了5.78、4.31和5.02百分点。
文摘针对区级人大报告特定的几方面内容进行文本分类,可以让人大工作人员对不同工作内容进行快速分辨,是构建人大报告辅助生成系统的必要内容。为对不同内容分类,基于TF-IDF(词频-逆文档频率)与知识增强语义表示模型ERNIE(enhanced representation from knowledge integration)结合构建分类模型。ERNIE直接对语义知识单元进行建模,在此基础上加入TF-IDF提升模型性能。实验结果表明,该方法在分类的准确率和召回率上表现不错,使ERNIE模型收敛速度加快,通过该模型可以较好地对人大报告的文本进行分类。
基金The Knowledge Innovation Program Key Project (KSCX1-YW-R-07)
文摘A bacterial cell surface display technique based on an ice nucleation protein has been employed for the development of live vaccine against viral infection. Due to its ubiquitous ability to invade host cells, Salmonella typhimurium might be a good candidate for displaying viral antigens. We demonstrated the surface display of domain III of Japanese encephalitis virus E protein and the enhanced green fluorescent protein on S. typhimurium BRD509 using the ice nucleation protein. The effects of the motif in the ice nucleation protein on the effective display of integral protein were also investigated. The results showed that display motifs in the protein can target integral foreign protein on the surface of S. typhimurium BRD509. Moreover, recombinant strains with surface displayed viral proteins retained their invasiveness, suggesting that the recombinant S. typhimurium can be used as live vaccine vector for eliciting complete immunogenicity. The data may yield better understanding of the mechanism by which ice nucleation protein displays foreign proteins in the Salmonella strain.
文摘【目的】探究ERNIE模型(Enhanced Language Representation with Informative Entities)和双向门限循环单元(Bi GRU)在医疗疾病名称科室分类中的效果及差异。【方法】以医疗疾病名称为训练样本,以BERT(Bidirectional Encoder Representation from Transformers)为对比模型并在模型之后加入不同网络层进行训练探究。【结果】ERNIE模型在分类效果上优于BERT模型,精度约高4%,其中精确度可达79.48%,召回率可达79.73%,F1分数可达79.50%。【局限】仅对其中的八个科室进行分类研究,其他类别由于数据量过少而未纳入分类体系中。【结论】ERNIE-BiGRU分类效果较好,可应用于医疗导诊系统或者卫生统计学中。