摘要
当前,生物医学领域的非结构化文本形式提供的数据量呈爆炸式增长。有效识别生物医学实体是提取非结构化文本中隐藏的生物医学知识并将其转化为结构化格式的前提。因此,生物命名实体识别(BioNER)任务具有重要的研究价值。其中,基于人类表型本体(HPO)的表型识别也非常重要。目前,中英文领域都开发出了针对生物医学文献和电子病历来对其中的临床表型术语进行标准化的方法。但是,这些方法大多基于源代码,不便于用户使用。文章希望把这些方法向临床医生或者科研工作者进行推广应用,从而推进中英文临床表型相关研究的发展。因此,文章开发出了一个Web系统,它可以通过网站对中英文电子病历中的临床表型进行标准化,并且可以基于用户的反馈交互来不斷扩充语料,以提升模型性能。
The amount of data provided in the form of unstructured text articles in the biomedical field is currently exploding,and effective identification ofbiomedical entities is a prerequisite for extracting the hidden biomedical knowledge in unstructured text and converting it into a structured format.Therefore,the BioNER(Biological Named Entity Recognition)task is of great research value.Among them,human phenotype ontology(HPO)-based phenotype recognition is also a very important part.Currently,methods have been developed in both English and Chinese fields to standardize clinical phenotypic terms in biomedical literature and electronic medical records for them.Elowever,most of these methods are source code based and not user-friendly.We hope to promote the application of these methods to clinicians or researchers,thus helping to advance the development of clinical phenotype-related research in English and Chinese.Therefore,we develop a web-based system that can standardize clinical phenotypes in English and Chinese electronic medical records through a website,and can continuously expand the corpus and improve model performance based on user feedback interactions.
作者
齐磊
齐莹莹
尧玉恒
QI Lei;QI Yingying;YAO Yuheng(Fudan University,Shanghai 200133,China;Anhui Mingguang People’s Hospital,Chuzhou,Anhui 239100,China)
出处
《计算机应用文摘》
2022年第7期35-37,共3页
Chinese Journal of Computer Application
关键词
电子病历
临床表型
标准化
语料收集
electronic medical record
clinical phenotype
standardization
corpus collection