期刊文献+

基于医学大数据的预训练语言模型及其医学文本分类研究 被引量:3

Medical big data-based pre-trained language model and classification of its medical texts
下载PDF
导出
摘要 目的:构建基于医学文本的预训练语言模型,以解决基于通用语料的预训练语言模型不适应医学文本分类的问题。方法:使用PubMed医学论文摘要数据和PMC医学论文全文数据在通用预训练语言模型Bert上进行二次预训练,得到医学领域的预训练语言模型BioBert,使用标注好的文本数据对BioBert进行微调,得到最终的医学文本分类模型。结果:病历文本和医学论文摘要文本两个数据集的分类实验显示,经过医学文本二次预训练的预训练语言模型在两个数据集上都取得了较好的分类效果。结论:通过自训练的方式对大量医学文本进行预训练得到的医学领域预训练语言模型,能在一定程度上解决使用通用预训练语言模型无法很好适配医学文本分布而导致分类性能偏低的问题。 Objective To establish the medical texts-based pre-trained language model in order to solve the general corpus-based pre-trained language model which is not adaptable to the classification of medical texts.Methods The BioBert,a pre-trained language model of medical domain,was established by a secondary training in Bert,a general pre-trained language model,using the PubMed-covered data of medical abstracts and PMC-covered data of medical papers,and the classification model of medical texts was established by minor adjustment of BioBert using the marked text data.Results The classification of medical records text and medical abstracts text showed good results in the pre-trained language model after their secondary pre-training.Conclusion The pre-trained language model of medical domain established by pre-training a large number of medical texts can,in a certain degree,solve the low classification performance due to the distribution of medical texts which is not adaptable to the general pre-trained language model.
作者 黄敏婷 赵静 于涛 HUANG Min-ting;ZHAO Jing;YU Tao(Beijing University of Traditional Chinese Medicine, Beijing 100029,China;Nanyang University of Science and Tecgnology, Singapore 639798,China)
出处 《中华医学图书情报杂志》 CAS 2020年第11期39-46,共8页 Chinese Journal of Medical Library and Information Science
关键词 医学文本 预训练语言模型 文本分类 Medical text Pre-trained language model Text classification
  • 相关文献

参考文献4

  • 1王波,吕筠,李立明.生物医学大数据:现状与展望[J].中华流行病学杂志,2014,35(6):617-620. 被引量:30
  • 2李天柱,王圣慧,马佳.基于概念置换的大数据定义研究[J].科技管理研究,2015,35(12):173-177. 被引量:11
  • 3M.Ablikim,M.N.Achasov,P.Adlarson,S.Ahmed,M.Albrecht,M.Alekseev,A.Amoroso,F.F.An,Q.An,Y.Bai,O.Bakina,R.Baldini Ferroli,Y.Ban,K.Begzsuren,J.V.Bennett,N.Berger,M.Bertani,D.Bettoni,F.Bianchi,J Biernat,J.Bloms,I.Boyko,R.A.Briere,L.Calibbi,H.Cai,X.Cai,A.Calcaterra,G.F.Cao,N.Cao,S.A.Cetin,J.Chai,J.F.Chang,W.L.Chang,J.Charles,G.Chelkov,Chen,G.Chen,H.S.Chen,J.C.Chen,M.L.Chen,S.J.Chen,Y.B.Chen,H.Y.Cheng,W.Cheng,G.Cibinetto,F.Cossio,X.F.Cui,H.L.Dai,J.P.Dai,X.C.Dai,A.Dbeyssi,D.Dedovich,Z.Y.Deng,A.Denig,Denysenko,M.Destefanis,S.Descotes-Genon,F.De Mori,Y.Ding,C.Dong,J.Dong,L.Y.Dong,M.Y.Dong,Z.L.Dou,S.X.Du,S.I.Eidelman,J.Z.Fan,J.Fang,S.S.Fang,Y.Fang,R.Farinelli,L.Fava,F.Feldbauer,G.Felici,C.Q.Feng,M.Fritsch,C.D.Fu,Y.Fu,Q.Gao,X.L.Gao,Y.Gao,Y.Gao,Y.G.Gao,Z.Gao,B.Garillon,I.Garzia,E.M.Gersabeck,A.Gilman,K.Goetzen,L.Gong,W.X.Gong,W.Gradl,M.Greco,L.M.Gu,M.H.Gu,Y.T.Gu,A.Q.Guo,F.K.Guo,L.B.Guo,R.P.Guo,Y.P.Guo,A.Guskov,S.Han,X.Q.Hao,F.A.Harris,K.L.He,F.H.Heinsius,T.Held,Y.K.Heng,Y.R.Hou,Z.L.Hou,H.M.Hu,J.F.Hu,T.Hu,Y.Hu,G.S.Huang,J.S.Huang,X.T.Huang,X.Z.Huang,Z.L.Huang,N.Huesken,T.Hussain,W.Ikegami Andersson,W.Imoehl,M.Irshad,Q.Ji,Q.P.Ji,X.B.Ji,X.L.Ji,H.L.Jiang,X.S.Jiang,X.Y.Jiang,J.B.Jiao,Z.Jiao,D.P.Jin,S.Jin,Y.Jin,T.Johansson,N.Kalantar-Nayestanaki,X.S.Kang,R.Kappert,M.Kavatsyuk,B.C.Ke,I.K.Keshk,T.Khan,A.Khoukaz,P.Kiese,R.Kiuchi,R.Kliemt,L.Koch,O.B.Kolcu,B.Kopf,M.Kuemmel,M.Kuessner,A.Kupsc,M.Kurth,M.G.Kurth,W.Kuhn,J.S.Lange,P.Larin,L.Lavezzi,H.Leithoff,T.Lenz,C.Li,Cheng Li,D.M.Li,F.Li,F.Y.Li,G.Li,H.B.Li,H.J.Li,J.C.Li,J.W.Li,Ke Li,L.K.Li,Lei Li,P.L.Li,P.R.Li,Q.Y.Li,W.D.Li,W.G.Li,X.H.Li,X.L.Li,X.N.Li,X.Q.Li,Z.B.Li,H.Liang,H.Liang,Y.F.Liang,Y.T.Liang,G.R.Liao,L.Z.Liao,J.Libby,C.X.Lin,D.X.Lin,Y.J.Lin,B.Liu,B.J.Liu,C.X.Liu,D.Liu,D.Y.Liu,F.H.Liu,Fang Liu,Feng Liu,H.B.Liu,H.M.Liu,Huanhuan Liu,Huihui Liu,J.B.Liu,J.Y.Liu,K.Y.Liu,Ke Liu,Q.Liu,S.B.Liu,T.Liu,X.Liu,X.Y.Liu,Y.B.Liu,Z.A.Liu,Zhiqing Liu,Y.F.Long,X.C.Lou,H.J.Lu,J.D.Lu,J.G.Lu,Y.Lu,Y.P.Lu,C.L.Luo,M.X.Luo,P.W.Luo,T.Luo,X.L.Luo,S.Lusso,X.R.Lyu,F.C.Ma,H.L.Ma,L.L.Ma,M.M.Ma,Q.M.Ma,X.N.Ma,X.X.Ma,X.Y.Ma,Y.M.Ma,F.E.Maas,M.Maggiora,S.Maldaner,S.Malde,Q.A.Malik,A.Mangoni,Y.J.Mao,Z.P.Mao,S.Marcello,Z.X.Meng,J.G.Messchendorp,G.Mezzadri,J.Min,T.J.Min,R.E.Mitchell,X.H.Mo,Y.J.Mo,C.Morales Morales,N.Yu.Muchnoi,H.Muramatsu,A.Mustafa,S.Nakhoul,Y.Nefedov,F.Nerling,I.B.Nikolaev,Z.Ning,S.Nisar,S.L.Niu,S.L.Olsen,Q.Ouyang,S.Pacetti,Y.Pan,M.Papenbrock,P.Patteri,M.Pelizaeus,H.P.Peng,K.Peters,A.A.Petrov,J.Pettersson,J.L.Ping,R.G.Ping,A.Pitka,R.Poling,V.Prasad,M.Qi,T.Y.Qi,S.Qian,C.F.Qiao,N.Qin,X.P.Qin,X.S.Qin,Z.H.Qin,J.F.Qiu,S.Q.Qu,K.H.Rashid,C.F.Redmer,M.Richter,M.Ripka,A.Rivetti,V.Rodin,M.Rolo,G.Rong,J.L.Rosner,Ch.Rosner,M.Rump,A.Sarantsev,M.Savrie,K.Schoenning,W.Shan,X.Y.Shan,M.Shao,C.P.Shen,P.X.Shen,X.Y.Shen,H.Y.Sheng,X.Shi,X.D Shi,J.J.Song,Q.Q.Song,X.Y.Song,S.Sosio,C.Sowa,S.Spataro,F.F.Sui,G.X.Sun,J.F.Sun,L.Sun,S.S.Sun,X.H.Sun,Y.J.Sun,Y.K Sun,Y.Z.Sun,Z.J.Sun,Z.T.Sun,Y.T Tan,C.J.Tang,G.Y.Tang,X.Tang,V.Thoren,B.Tsednee,I.Uman,B.Wang,B.L.Wang,C.W.Wang,D.Y.Wang,H.H.Wang,K.Wang,L.L.Wang,L.S.Wang,M.Wang,M.Z.Wang,Wang Meng,P.L.Wang,R.M.Wang,W.P.Wang,X.Wang,X.F.Wang,X.L.Wang,Y.Wang,Y.F.Wang,Z.Wang,Z.G.Wang,Z.Y.Wang,Zongyuan Wang,T.Weber,D.H.Wei,P.Weidenkaff,H.W.Wen,S.P.Wen,U.Wiedner,G.Wilkinson,M.Wolke,L.H.Wu,L.J.Wu,Z.Wu,L.Xia,Y.Xia,S.Y.Xiao,Y.J.Xiao,Z.J.Xiao,Y.G.Xie,Y.H.Xie,T.Y.Xing,X.A.Xiong,Q.L.Xiu,G.F.Xu,L.Xu,Q.J.Xu,W.Xu,X.P.Xu,F.Yan,L.Yan,W.B.Yan,W.C.Yan,Y.H.Yan,H.J.Yang,H.X.Yang,L.Yang,R.X.Yang,S.L.Yang,Y.H.Yang,Y.X.Yang,Yifan Yang,Z.Q.Yang,M.Ye,M.H.Ye,J.H.Yin,Z.Y.You,B.X.Yu,C.X.Yu,J.S.Yu,C.Z.Yuan,X.Q.Yuan,Y.Yuan,A.Yuncu,A.A.Zafar,Y.Zeng,B.X.Zhang,B.Y.Zhang,C.C.Zhang,D.H.Zhang,H.H.Zhang,H.Y.Zhang,J.Zhang,J.L.Zhang,J.Q.Zhang,J.W.Zhang,J.Y.Zhang,J.Z.Zhang,K.Zhang,L.Zhang,S.F.Zhang,T.J.Zhang,X.Y.Zhang,Y.Zhang,Y.H.Zhang,Y.T.Zhang,Yang Zhang,Yao Zhang,Yi Zhang,Yu Zhang,Z.H.Zhang,Z.P.Zhang,Z.Q.Zhang,Z.Y.Zhang,G.Zhao,J.W.Zhao,J.Y.Zhao,J.Z.Zhao,Lei Zhao,Ling Zhao,M.G.Zhao,Q.Zhao,S.J.Zhao,T.C.Zhao,Y.B.Zhao,Z.G.Zhao,A.Zhemchugov,B.Zheng,J.P.Zheng,Y.Zheng,Y.H.Zheng,B.Zhong,L.Zhou,L.P.Zhou,Q.Zhou,X.Zhou,X.K.Zhou,Xingyu Zhou,Xiaoyu Zhou,Xu Zhou,A.N.Zhu,J.Zhu,J.Zhu,K.Zhu,K.J.Zhu,S.H.Zhu,W.J.Zhu,X.L.Zhu,Y.C.Zhu,Y.S.Zhu,Z.A.Zhu,J.Zhuang,B.S.Zou,J.H.Zou,无.Future Physics Programme of BESⅢ[J].Chinese Physics C,2020,44(4). 被引量:544
  • 4周永称,崔忠芳,范少萍,安新颖.基于深度学习的生物医学文本分类研究[J].中华医学图书情报杂志,2019,28(11):1-10. 被引量:4

二级参考文献59

  • 1靳小波.文本分类综述[J].自动化博览,2006,23(z1):24-29. 被引量:16
  • 2Jee K,Kim GH.Potentiality of big data in the medical sector:focus on how to reshape the healthcare system [J].Healthc Inform Res,2013,19(2):79-85.
  • 3Mayer-Schonberger V,Cukier K.Big Data:a revolution that will transform how we live,work,and think[M].Boston:Houghton Mifflin Harcourt,2013.
  • 4UN Global Pulse.Big data for development:challenges and opportunities [EB/OL].(2012-05-01)[2014-04-09].http://www.unglobalpulse.org/sites/de fault/files/BigDataforDevelopment-GlobalPulseMay2012.pdf.
  • 5Costa FF.Big data in biomedicine[J].Drug Discov Today,2014,19(4):433-440.
  • 6Monty Zarrouk,NetApp.Delivering excellence in patient care with ready access to clinical data [EB/OL].(2012-09-01)[ 2014-04-09 ].http://www.netapp.com/us/media/wp-7169.pdf.
  • 7Groves P,Kayyali B,Knott D,et al.The Big Data Revolution in Healthcare:Accelerating Value and Innovation [M].New York(NY):McKinsey Global Institute,2013.
  • 8Murdoch TB,Detsky AS.The inevitable application of big data to health care[J].JAMA,2013,309(13):1351-1352.
  • 9Chawla NV,Davis DA.Bring big data to personalized healthcare:a patient-centered framework [J].J Gen Intern Med,2013,28 Suppl 3:S660-665.
  • 10Chunara R,Andrews JR,Brownstein JS.Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak[J].Am J Trop Med Hyg,2012,86(1):39-45.

共引文献582

同被引文献21

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部